huggingface-example

Sample NLP streaming workflow using an LLM from Hugging Face and PyEnsign

This is an example of a sentiment analysis application using sample yelp ratings data from Kaggle using Hugging Face and PyEnsign.

To use PyEnsign, create a free account on rotational.app. You will need to do the following once you create an account:

Create a project.
Add the following topic to the project: yelp_data. Check out this video on how to add a topic. You can choose your own names for the topic but make sure that you update the code accordingly.
Generate API keys for your project.

You will need to create and source the following environment variables prior to running the example:

export ENSIGN_CLIENT_ID="your client id here"
export ENSIGN_CLIENT_SECRET="your client secret here"

This application consists of three components:

Trainer reads data from the yelp_train.csv file and builds a model using the pretrained DistilBERT LLM from Hugging Face. The best model gets written to the final_model directory.
ScoreDataPublisher reads data from the yelp_score.csv file publishes to the yelp_data topic.
Scorer listens for new messages in the yelp_data topic. When it receives a new message, it uses the trained Hugging Face model in the final_model directory to make predictions.

Steps to run the application

Create a virtual environment

$ virtualenv venv

Activate the virtual environment

$ source venv/bin/activate

Install the required packages

$ pip install -r requirements.txt

Open three terminal windows

Run the Trainer in the first window (make sure to activate the virtual environment first). This will create three checkpoint directories under the `trained_models` directory and the final model configurations and weights in the `final_model` directory.

$ source venv/bin/activate

$ python huggingface_trainer.py

Once the training is complete, run the Scorer in the second window (make sure to activate the virtual environment first)

$ source venv/bin/activate

$ python huggingface_scorer.py score

Run the ScoreDataPublisher in the third window (make sure to activate the virtual environment first)

$ source venv/bin/activate

$ python huggingface_scorer.py score_data

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
huggingface_scorer.py		huggingface_scorer.py
huggingface_trainer.py		huggingface_trainer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

huggingface_scorer.py

huggingface_scorer.py

huggingface_trainer.py

huggingface_trainer.py

requirements.txt

requirements.txt

Repository files navigation

huggingface-example

Steps to run the application

Create a virtual environment

Activate the virtual environment

Install the required packages

Open three terminal windows

Run the Trainer in the first window (make sure to activate the virtual environment first). This will create three checkpoint directories under the `trained_models` directory and the final model configurations and weights in the `final_model` directory.

Once the training is complete, run the Scorer in the second window (make sure to activate the virtual environment first)

Run the ScoreDataPublisher in the third window (make sure to activate the virtual environment first)

About

Releases

Packages

Languages

License

rotationalio/huggingface-example

Folders and files

Latest commit

History

Repository files navigation

huggingface-example

Steps to run the application

Create a virtual environment

Activate the virtual environment

Install the required packages

Open three terminal windows

Run the Trainer in the first window (make sure to activate the virtual environment first). This will create three checkpoint directories under the trained_models directory and the final model configurations and weights in the final_model directory.

Once the training is complete, run the Scorer in the second window (make sure to activate the virtual environment first)

Run the ScoreDataPublisher in the third window (make sure to activate the virtual environment first)

About

Resources

License

Stars

Watchers

Forks

Languages

Run the Trainer in the first window (make sure to activate the virtual environment first). This will create three checkpoint directories under the `trained_models` directory and the final model configurations and weights in the `final_model` directory.