# Fortnight 1: Applied Exploratio


### Demoed for: Conrad, Saul, Katja, Caitlyn

## Applied Exploration
### *pulled from F1_2*

Go to the Hugging Face models page: https://huggingface.co/models
* Click `Text Classification`
* Find a different model and a dataset appropriate for testing it with than the ones we worked with today
    - many models will link to the datasets they were trained on, but you can find others at https://huggingface.co/datasets
    - write down some info about the models you found
        - what is it for?
        - who made it?
        - what kind of data was it trained on?
        - are they based on some other model and trained on new data (*fine-tuned*) for a specific task?
    - write down some info on the dataset you found
        - where did it come from?
        - how big is it?
        - what kind of labels does it classify?
* Evaluate the performance 
    - use some of the metrics we talked about today
    - describe in your own words how it performed
    

In [10]:
import sys
!{sys.executable} -m pip install transformers datasets



In [11]:
!{sys.executable} -m pip uninstall transformers datasets

Found existing installation: transformers 4.33.2
Uninstalling transformers-4.33.2:
  Would remove:
    /Users/elijahlueders/Library/Caches/pypoetry/virtualenvs/nlp-local-JrpFJHsc-py3.10/bin/transformers-cli
    /Users/elijahlueders/Library/Caches/pypoetry/virtualenvs/nlp-local-JrpFJHsc-py3.10/lib/python3.10/site-packages/transformers-4.33.2.dist-info/*
    /Users/elijahlueders/Library/Caches/pypoetry/virtualenvs/nlp-local-JrpFJHsc-py3.10/lib/python3.10/site-packages/transformers/*
Proceed (Y/n)? ^C
[31mERROR: Operation cancelled by user[0m[31m
[0m

In [13]:
!{sys.executable} -m pip install scikit-learn



In [14]:
from datasets import load_dataset

dataset = load_dataset("amazon_reviews_multi", "en")

In [15]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 200000
    })
    validation: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 5000
    })
    test: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 5000
    })
})


In [16]:
print(type(dataset))

<class 'datasets.dataset_dict.DatasetDict'>


In [17]:
print(dataset["test"])

Dataset({
    features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
    num_rows: 5000
})


In [18]:
type(dataset["test"]["review_id"]) 


list

In [19]:
dataset["test"].features

{'review_id': Value(dtype='string', id=None),
 'product_id': Value(dtype='string', id=None),
 'reviewer_id': Value(dtype='string', id=None),
 'stars': Value(dtype='int32', id=None),
 'review_body': Value(dtype='string', id=None),
 'review_title': Value(dtype='string', id=None),
 'language': Value(dtype='string', id=None),
 'product_category': Value(dtype='string', id=None)}

In [20]:
# loop through features and print each type
for feature_name in dataset["test"].features:
    print(type(dataset["test"][feature_name]))


<class 'list'>
<class 'list'>
<class 'list'>
<class 'list'>
<class 'list'>
<class 'list'>
<class 'list'>
<class 'list'>


In [21]:
dataset["test"][0:5]

{'review_id': ['en_0199937',
  'en_0863335',
  'en_0565010',
  'en_0963290',
  'en_0238156'],
 'product_id': ['product_en_0902516',
  'product_en_0348072',
  'product_en_0356154',
  'product_en_0583322',
  'product_en_0487636'],
 'reviewer_id': ['reviewer_en_0097389',
  'reviewer_en_0601537',
  'reviewer_en_0970351',
  'reviewer_en_0216125',
  'reviewer_en_0514203'],
 'stars': [1, 1, 1, 1, 1],
 'review_body': ['These are AWFUL. They are see through, the fabric feels like tablecloth, and they fit like children’s clothing. Customer service did seem to be nice though, but I regret missing my return date for these. I wouldn’t even donate them because the quality is so poor.',
  'I bought 4 and NONE of them worked. Yes I used new batteries!',
  "On first use it didn't heat up and now it doesn't work at all",
  "You want an HONEST answer? I just returned from UPS where I returned the FARCE of an earring set to Amazon. It did NOT look like what I saw on Amazon. Only a baby would be able to we

In [22]:
dataset["test"].features


{'review_id': Value(dtype='string', id=None),
 'product_id': Value(dtype='string', id=None),
 'reviewer_id': Value(dtype='string', id=None),
 'stars': Value(dtype='int32', id=None),
 'review_body': Value(dtype='string', id=None),
 'review_title': Value(dtype='string', id=None),
 'language': Value(dtype='string', id=None),
 'product_category': Value(dtype='string', id=None)}

In [23]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
from datasets import load_dataset

sample_size = 10

tokenizer = AutoTokenizer.from_pretrained("philschmid/distilbert-base-multilingual-cased-sentiment-2")
model = AutoModelForSequenceClassification.from_pretrained("philschmid/distilbert-base-multilingual-cased-sentiment-2")
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

results = classifier(dataset["test"]["review_body"][0:sample_size])
print(f"Here are the {len(results)} predictions")
# print(results)

total_correct = 0
for idx in range(sample_size):
    print(f"-------\nSample {idx+1} of {sample_size}:")
    print(dataset["test"]["review_body"][idx])


    actual_label_numeric = dataset["test"]["stars"][idx]
    print(f"Stars: {actual_label_numeric}")

    if actual_label_numeric == 3:
        actual_label = "neutral"
    elif actual_label_numeric > 3:
        actual_label = "positive"
    else:
        actual_label = "negative"
    print(f"Actual label: {actual_label}")

    predicted_label = results[idx]["label"]
    print(f"Predicted label: {predicted_label}")

    if predicted_label == actual_label:
        total_correct += 1

print(f"-------\nAccuracy: {total_correct / sample_size}")

Here are the 10 predictions
-------
Sample 1 of 10:
These are AWFUL. They are see through, the fabric feels like tablecloth, and they fit like children’s clothing. Customer service did seem to be nice though, but I regret missing my return date for these. I wouldn’t even donate them because the quality is so poor.
Stars: 1
Actual label: negative
Predicted label: negative
-------
Sample 2 of 10:
I bought 4 and NONE of them worked. Yes I used new batteries!
Stars: 1
Actual label: negative
Predicted label: positive
-------
Sample 3 of 10:
On first use it didn't heat up and now it doesn't work at all
Stars: 1
Actual label: negative
Predicted label: negative
-------
Sample 4 of 10:
You want an HONEST answer? I just returned from UPS where I returned the FARCE of an earring set to Amazon. It did NOT look like what I saw on Amazon. Only a baby would be able to wear the size of the earring. They were SO small. the size of a pin head I at first thought Amazon had forgotten to enclose them in th

In [24]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
from datasets import load_dataset

sample_size = 1000

tokenizer = AutoTokenizer.from_pretrained("philschmid/distilbert-base-multilingual-cased-sentiment-2")
model = AutoModelForSequenceClassification.from_pretrained("philschmid/distilbert-base-multilingual-cased-sentiment-2")
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

results = classifier(dataset["test"]["review_body"][0:sample_size])

total_correct = 0
for idx in range(sample_size):


    actual_label_numeric = dataset["test"]["stars"][idx]

    if actual_label_numeric == 3:
        actual_label = "neutral"
    elif actual_label_numeric > 3:
        actual_label = "positive"
    else:
        actual_label = "negative"

    predicted_label = results[idx]["label"]

    if predicted_label == actual_label:
        total_correct += 1

print(f"For {len(results)} samples, the accuracy is {total_correct / sample_size}")

For 1000 samples, the accuracy is 0.929


In [25]:
def calculate_accuracy(results, dataset):
    total_correct = 0
    sample_size = len(results)

    for idx in range(sample_size):
        actual_label_numeric = dataset["test"]["stars"][idx]

        if actual_label_numeric == 3:
            actual_label = "neutral"
        elif actual_label_numeric > 3:
            actual_label = "positive"
        else:
            actual_label = "negative"

        predicted_label = results[idx]["label"]

        if predicted_label == actual_label:
            total_correct += 1

    return total_correct / sample_size


In [27]:
calculate_accuracy(results, dataset)

0.929

In [28]:
sample_size = 1500
results = classifier(dataset["test"]["review_body"][0:sample_size])
print(f"For {sample_size} samples, the accuracy is {calculate_accuracy(results, dataset)}")

For 1500 samples, the accuracy is 0.862


In [29]:
sample_size = 2000
results = classifier(dataset["test"]["review_body"][0:sample_size])
print(f"For {sample_size} samples, the accuracy is {calculate_accuracy(results, dataset)}")

For 2000 samples, the accuracy is 0.837


In [30]:
sample_size = 2500
results = classifier(dataset["test"]["review_body"][0:sample_size])
print(f"For {sample_size} samples, the accuracy is {calculate_accuracy(results, dataset)}")

For 2500 samples, the accuracy is 0.74


In [33]:
# shuffle the dataset
dataset = dataset.shuffle()

In [35]:
sample_size = 2500
results = classifier(dataset["test"]["review_body"][0:sample_size])
print(f"For {sample_size} samples, the accuracy is {calculate_accuracy(results, dataset)}")

For 2500 samples, the accuracy is 0.7464


In [36]:
sample_size = 1000
results = classifier(dataset["test"]["review_body"][0:sample_size])
print(f"For {sample_size} samples, the accuracy is {calculate_accuracy(results, dataset)}")

For 1000 samples, the accuracy is 0.737


In [37]:
sample_size = 500
results = classifier(dataset["test"]["review_body"][0:sample_size])
print(f"For {sample_size} samples, the accuracy is {calculate_accuracy(results, dataset)}")

For 500 samples, the accuracy is 0.766


In [38]:
dataset = dataset.shuffle()
sample_size = 1000
results = classifier(dataset["test"]["review_body"][0:sample_size])
print(f"For {sample_size} samples, the accuracy is {calculate_accuracy(results, dataset)}")

For 1000 samples, the accuracy is 0.74


In [45]:
results = classifier(dataset["test"]["review_body"][0:3500])

RuntimeError: The size of tensor a (582) must match the size of tensor b (512) at non-singleton dimension 1

In [None]:
len(results)