# Scikit-LLM
* https://github.com/iryna-kondr/scikit-llm

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fuyu-quant/data-science-wiki/blob/main/Natural_Language_processing/Classification(japanese)/scikit-llm.ipynb)

In [1]:
%%capture
!pip install scikit-llm

In [6]:
from skllm import ZeroShotGPTClassifier
from skllm.datasets import get_classification_dataset
from skllm.config import SKLLMConfig

from sklearn.model_selection import train_test_split

SKLLMConfig.set_openai_key("YOUR_API_KEY")
SKLLMConfig.set_openai_org("YOUR_ORG_ID")

In [10]:
X, y = get_classification_dataset()
print(len(X))
print(X[:3])
print(y[:3])

30
["I was absolutely blown away by the performances in 'Summer's End'. The acting was top-notch, and the plot had me gripped from start to finish. A truly captivating cinematic experience that I would highly recommend.", "The special effects in 'Star Battles: Nebula Conflict' were out of this world. I felt like I was actually in space. The storyline was incredibly engaging and left me wanting more. Excellent film.", "'The Lost Symphony' was a masterclass in character development and storytelling. The score was hauntingly beautiful and complimented the intense, emotional scenes perfectly. Kudos to the director and cast for creating such a masterpiece."]
['positive', 'positive', 'positive']


In [21]:
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=42)

### ZeroShotGPTClassifier(ラベルあり)

In [22]:
clf = ZeroShotGPTClassifier(openai_model="gpt-3.5-turbo")
clf.fit(train_X, train_y)

In [23]:
labels = clf.predict(test_X)

100%|██████████| 6/6 [00:07<00:00,  1.17s/it]


In [24]:
print(test_y)
print(labels)

['neutral', 'negative', 'neutral', 'negative', 'positive', 'positive']
['neutral', 'negative', 'neutral', 'negative', 'positive', 'positive']


### ZeroShotGPTClassifier(ラベルなし)

In [26]:
clf = ZeroShotGPTClassifier()
clf.fit(None, ["positive", "negative", "neutral"])
labels = clf.predict(test_X)

100%|██████████| 6/6 [00:07<00:00,  1.20s/it]


In [27]:
print(test_y)
print(labels)

['neutral', 'negative', 'neutral', 'negative', 'positive', 'positive']
['neutral', 'negative', 'neutral', 'negative', 'positive', 'positive']


### Multi-Label Zero-Shot Text Classification(ラベルあり)

In [3]:
from skllm import MultiLabelZeroShotGPTClassifier
from skllm.datasets import get_multilabel_classification_dataset

In [40]:
X, y = get_multilabel_classification_dataset()

print(len(X))
print(X[:3])
print(y[:3])

10
['The product was of excellent quality, and the packaging was also very good. Highly recommend!', 'The delivery was super fast, but the product did not match the information provided on the website.', 'Great variety of products, but the customer support was quite unresponsive.']
[['Quality', 'Packaging'], ['Delivery', 'Product Information'], ['Product Variety', 'Customer Support']]


In [41]:
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=42)

In [42]:
clf = MultiLabelZeroShotGPTClassifier(max_labels=3)
clf.fit(train_X, train_y)
labels = clf.predict(test_X)

100%|██████████| 2/2 [00:03<00:00,  1.67s/it]


In [43]:
print(test_y)
print(labels)

[['Price', 'Quality', 'User Experience'], ['Delivery', 'Product Information']]
[['Quality', 'Price', 'User Experience'], ['Delivery', 'Product Information']]


### Multi-Label Zero-Shot Text Classification(ラベルなし)

In [32]:
candidate_labels = [
    "Quality",
    "Price",
    "Delivery",
    "Service",
    "Product Variety",
    "Customer Support",
    "Packaging",
    "User Experience",
    "Return Policy",
    "Product Information",
]

clf = MultiLabelZeroShotGPTClassifier(max_labels=3)
clf.fit(None, [candidate_labels])
labels = clf.predict(test_X)

100%|██████████| 6/6 [00:41<00:00,  6.97s/it]


In [44]:
print(test_X)

['The prices are a bit high. However, the product quality and user experience are worth it.', 'The delivery was super fast, but the product did not match the information provided on the website.']


In [34]:
print(labels)

[['Quality'], ['Quality'], ['Product Information'], ['Quality', 'Product Information'], ['Quality', 'Price'], ['Product Information']]


In [8]:
test_X = ['ハワイ旅行　格安', 'サッカー観戦', 'ラーメン 東京']

In [9]:
candidate_labels = [
    "男",
    "女",
    "若いもの",
    "高齢者",
    "子供",
    "スポーツ",
    "旅行",
    "家電",
    "食品",
]

clf = MultiLabelZeroShotGPTClassifier(max_labels=3)
clf.fit(None, [candidate_labels])
labels = clf.predict(test_X)

100%|██████████| 3/3 [00:03<00:00,  1.04s/it]


In [10]:
print(labels)

[['旅行'], ['スポーツ'], ['食品']]
