<a href="https://colab.research.google.com/github/ibra56/Linear-regression-model/blob/main/GIZ_AI_Skills_Huggingface_NLP_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introductory Workshop on NLP with Transformers and Hugging Face

Welcome to our introductory workshop on Natural Language Processing (NLP) using Transformers and the Hugging Face library. This session is designed to demystify NLP concepts and provide you with hands-on experience in sentiment analysis and topic classification.

## Introduction to NLP and Transformers

### What is NLP?

Natural Language Processing (NLP) is an exciting field of artificial intelligence that focuses on the interaction between computers and humans through natural language. The goal of NLP is to enable computers to understand, interpret, and respond to human languages in a valuable way. Applications of NLP include voice-activated GPS systems, digital assistants, automatic translation services, and many more.

### The Revolution of Transformers

Transformers have revolutionized the way machines understand human language. Unlike previous models that processed words in sequence, transformers process all words in a sentence simultaneously. This parallel processing allows the model to understand the context of each word more effectively. Models like BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) are examples of how transformers have set new standards for NLP tasks.

Older NLP architectures mostly  relied on vanilla RNNs(shown below) and LSTMs.


<div>
<img src="https://stanford.edu/~shervine/teaching/cs-230/illustrations/architecture-rnn-ltr.png?9ea4417fc145b9346a3e288801dbdfdc" width="60%"/>
</div>

These have all but been replaced by transformers

<!-- ![transformer](https://machinelearningmastery.com/wp-content/uploads/2021/08/attention_research_1.png) -->

<div>
<img src="https://machinelearningmastery.com/wp-content/uploads/2021/08/attention_research_1.png" width="60%"/>
</div>

## Getting Started with Hugging Face Transformer Library

Before we dive into the practical exercises, let's set up our environment by installing the Hugging Face `transformers` library in Google Colab.

```python
!pip install transformers
```

Hugging Face provides a vast ecosystem for NLP, including **pre-trained models, datasets, and tools** for efficient machine learning workflows.



In [None]:
!pip install transformers




# Task 1: Sentiment Analysis
## Theoretical Background
Sentiment analysis is a technique used in NLP to determine the emotional tone behind words. It's widely used in analyzing opinions from reviews, social media, and more.

## Practical Coding Exercise
Implement sentiment analysis using a pre-trained model from Hugging Face:

In [None]:
from transformers import pipeline

# Load the sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

# Analyze the sentiment of a sentence
result = sentiment_pipeline("I don't like  transformers for NLP tasks!")
print(result)


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9981619715690613}]


This code loads a pre-trained sentiment analysis model and processes a sample sentence to determine its sentiment.





## Exercises

**Exercise 1** : Your task is to write a Python function that processes a list of sentences and returns their sentiment scores.

Your api should be as follows:

```python
sentences = ["I love learning about AI!", "This workshop is challenging but rewarding.", "I'm not sure I understand all of this."]
print(analyze_sentiments(sentences))
```

**Exercise 2:** Try 3 different models and draw a table comparing their performance e.g:

```python
specific_model = pipeline(model="finiteautomata/ bertweet-base-sentiment-analysis")
specific_model(data)
```
Select from over 3,000 models on HF: https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads&search=sentiment

**Exercise 3:** Create your own sentiment analysis dataset using Uganda-centric data e.g. scrape a news site, governement site, social media, dataverse etc. Your dataset should contain atleast 100 sentences.

**Exercise  4**: Create an account on hugging face and upload your best model and datasets to huggingface and submit the inference link.

See:  https://huggingface.co/docs/datasets/v1.16.0/upload_dataset.html and https://huggingface.co/transformers/v4.10.1/model_sharing.html

For more details including how to finetune yours: https://huggingface.co/blog/sentiment-analysis-python

# Task 2: Topic Classification
## Theoretical Background
Topic classification is about identifying the main themes or topics in a piece of text. It's essential for content categorization, information retrieval, and personalized content recommendations.

## Practical Coding Exercise
Next, we'll perform topic classification using another pre-trained model.

In [None]:
from transformers import pipeline

# Initialize the zero-shot classification pipeline
topic_classifier = pipeline("zero-shot-classification")

# Classify the topic of a given text
result = topic_classifier("Transformers are revolutionizing the field of NLP.", candidate_labels=["technology", "health", "finance", "education", "agriculture"])
print(result)


No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'Transformers are revolutionizing the field of NLP.', 'labels': ['technology', 'education', 'health', 'finance', 'agriculture'], 'scores': [0.9755109548568726, 0.008430569432675838, 0.0084175830706954, 0.003975448198616505, 0.003665406256914139]}


This code demonstrates how to classify a text into predefined categories, even if the model has not been explicitly trained on those categories.



## Exercises:
**Exercise 1**: Develop a function that classifies a list of texts into categories and counts the occurrences of each category.

**Exercise 2**: Collect or find a dataset with text in atleast 4 categories and test your pipeline in Exercise 1.

**Exercise  3**: Upload your best model and datasets to huggingface and submit the inference link.

Feel free to experiment with different texts and categories. Reflect on how sentiment analysis and topic classification can be applied in various domains.



## Outro

There are many other NLP tasks that can be solved with transformers. They also happen to be the foundational architecture for all modern LLMs that has birthed general intelligence beyond which humans thought was possible in machines before.

## Key References

1. **Hugging Face's Transformers Library Documentation:** Provides comprehensive guides and tutorials.  
   [Hugging Face Documentation](https://huggingface.co/docs/transformers/index)

2. **"Attention is All You Need":** The seminal paper introducing transformers.  
   [Attention Paper](https://arxiv.org/abs/1706.03762)

3. **NLP Course by Hugging Face:** A free course on NLP that covers the basics to advanced topics.  
   [Hugging Face Course](https://huggingface.co/course/chapter1)

4. **YouTube Videos for Visual Learners:**  
   - [Introduction to NLP](https://www.youtube.com/watch?v=fOvTtapxa9c) by Stanford University.  
   - [Transformers and Hugging Face](https://www.youtube.com/watch?v=KGPjcdOiXtg) for a hands-on tutorial on using transformers.
5. **CS30 Stanford:**
  - https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks