# Scikit-LLM: Sklearn Meets Large Language Models

[Scikit-LLM](https://github.com/iryna-kondr/scikit-llm) is a package that seamlessly integrates powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks.

## Installation 💾

```bash
pip install scikit-llm
```


## Features ⚡
- zero-shot text classification
- few-shot text classification
- text vectorization
- text summary
- language translation



## How does it work? 🙋‍♀️

**zero-shot**: Scikit-LLM will automatically query the OpenAI API and transform the response into a regular list of labels.

**few-shot**: the training samples will be added to prompt and passed to the model.

**dynamic few-shot**:
- during fitting, the "training" data is partitioned by class,
vectorized, and stored.
- during inference, the classifier looks for the nearest neighbors
which allows including only the most similar examples in the prompt.

## Tutorial 📖
In this tutorial we will briefly introduce  **Zero-Shot Text Classification**

One of the powerful ChatGPT features is the ability to perform text classification without being re-trained. For that, the only requirement is that the labels must be descriptive.

The `ZeroShotGPTClassifier` class allows to create such a model as a regular scikit-learn classifier.

### Configuring OpenAI API Key

In [4]:
from skllm import ZeroShotGPTClassifier
from skllm.config import SKLLMConfig

SKLLMConfig.set_openai_key("your openai key")
SKLLMConfig.set_openai_org("your openai org")

clf = ZeroShotGPTClassifier(openai_model="gpt-3.5-turbo")

### Zero-Shot Text Classification

Scikit-LLM will automatically query the OpenAI API and transform the response into a regular list of labels.

Additionally, Scikit-LLM will ensure that the obtained response contains a valid label. If this is not the case, a label will be selected randomly (label probabilities are proportional to label occurrences in the training set).

In [5]:
movie_reviews = [
    "This movie was absolutely wonderful. The storyline was compelling and the characters were very realistic.",
    "I really loved the film! The plot had a few unexpected twists which kept me engaged till the end.",
    "The movie was alright. Not great, but not bad either. A decent one-time watch.",
    "I didn't enjoy the film that much. The plot was quite predictable and the characters lacked depth.",
    "This movie was not to my taste. It felt too slow and the storyline wasn't engaging enough.",
    "The film was okay. It was neither impressive nor disappointing. It was just fine.",
    "I was blown away by the movie! The cinematography was excellent and the performances were top-notch.",
    "I didn't like the movie at all. The story was uninteresting and the acting was mediocre at best.",
    "The movie was decent. It had its moments but was not consistently engaging."
]

movie_review_labels = [
    "positive",
    "positive",
    "neutral",
    "negative",
    "negative",
    "neutral",
    "positive",
    "negative",
    "neutral"
]

new_movie_reviews = [
    # A positive review
    "The movie was fantastic! I was captivated by the storyline from beginning to end.",

    # A negative review
    "I found the film to be quite boring. The plot moved too slowly and the acting was subpar.",

    # A neutral review
    "The movie was okay. Not the best I've seen, but certainly not the worst."
]

In [6]:
# Train the model
clf.fit(X=movie_reviews, y=movie_review_labels)

# Use the trained classifier to predict the sentiment of the new reviews
predicted_movie_review_labels = clf.predict(X=new_movie_reviews)

for review, sentiment in zip(new_movie_reviews, predicted_movie_review_labels):
    print(f"Review: {review}\nPredicted Sentiment: {sentiment}\n\n")

100%|██████████| 3/3 [00:04<00:00,  1.43s/it]

Review: The movie was fantastic! I was captivated by the storyline from beginning to end.
Predicted Sentiment: positive


Review: I found the film to be quite boring. The plot moved too slowly and the acting was subpar.
Predicted Sentiment: negative


Review: The movie was okay. Not the best I've seen, but certainly not the worst.
Predicted Sentiment: neutral





