# Instructions
These instructions provide guidance on training an LLM using Netflix titles data and setting up an API to interact with it, allowing us to obtain a response to a prompt using a CURL command

### trainer.py
The directory includes a file named "trainer.py" that is utilized for training the model using the provided dataset. The dataset, stored in the directory as "netflix_titles.csv," is in CSV format. Our model is trained solely using the description column from the CSV file.

To initiate the training process, the "trainer.py" file needs to be executed, specifying the file path of the dataset using the 'fp' argument and the desired learning rate using the 'lr' argument.

To execute the "trainer.py" file, use the following command:

In [None]:
!python trainer.py --fp 'netflix_titles.csv' --lr 0.001

The training process will commence, and the resulting weights and configuration file will be stored in the "/out/netflixdata" directory. 

### server.py
With the training completed, we can proceed to perform inference on the model by utilizing the "server.py" file. To launch the server, execute the following command:

In [None]:
!python -m uvicorn server:app --reload

After starting the server, we can use the curl command to send a request and obtain an output from it

In [None]:
!set json={"text": "NimbleBox!", "max_length": 1000}
!curl -i -X POST -H "Content-Type: application/json" -d "%json:"=\"%" http://127.0.0.1:8000/generatetext

Now, let's move on to the aspect of implementing multithreading for stress testing and enhancing the performance of our model when using the command-line interface (CLI)

### test.py

Implemented a mechanism in "test.py" that utilizes multithreading to simulate multiple concurrent requests to the server. This stress testing approach helps evaluate the server's performance and assess its ability to handle a high volume of requests simultaneously. 
For stress testing, we can use this command:

In [None]:
!python test.py --url http://127.0.0.1:8000/generatetext --threads 10 --requests 10 --max_length 500

To enable multi-threaded inference from our model, we can utilize the following command

In [None]:
!python test.py --url http://127.0.0.1:8000/generatetext --messages "This" "is" "my" "Assignment" "for" "NimbleBox" "Internship" --max_length 500