# 1. Loading Libraries

In [2]:
import os
import sys
from datasets import load_dataset 
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig 
from transformers import pipeline
print('ready loading data')

# Check Python version
print("Python version:", sys.version)

# Check Conda environment
conda_env = os.environ.get('CONDA_DEFAULT_ENV')
print("Conda environment:", conda_env)

ready loading data
Python version: 3.9.19 (main, May  6 2024, 20:12:36) [MSC v.1916 64 bit (AMD64)]
Conda environment: base


In [17]:
#!pip install --upgrade transformers numpy tensorflow
#!pip install tf-keras

# 2. Defining functions

# 3. Natural Language Processing using Generative AI

### 3.1. Text Generation

In [11]:
from transformers import pipeline 

# generation pipeline
generator = pipeline('text-generation',model='gpt2')

# example of text generation
result=generator("Once upon a time,")
print(result)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Once upon a time, I thought I was going to read an article called 'How to Grow a Small Garden' by a retired, self-deprecating, professional gardener and, now, I have a few recommendations.\n\n1)"}]


In [29]:
llm =  pipeline('text-generation')
prompt = "New York city is famous for"
outputs = llm(prompt, max_length=150)
print(outputs[0]['generated_text'])

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


New York city is famous for its high-speed-oriented cycling projects (CBOs), but there's still something to be said for a city whose urban population is now only seven times the size of Denver, Texas, which is still much smaller.

And unlike Detroit or Denver, New York has been able to generate revenue from public transportation. New York also started developing public transit as well as adding bike lanes at the city's bus stops—but the city's lack of funding has made it even less effective in keeping people moving.

"I really want to see a world in which people are able to get around, if they are at all comfortable, the city," said Mayor Michael Bloomberg.

Bloomberg said his department aims


### 3.2. Sentiment Analysis

######  Ex1

In [13]:
# loading pipeline
from transformers import pipeline

# creating sentiment analysis
classifier = pipeline('sentiment-analysis')

# esxample of sentiment analysis
result =  classifier("I love to programm in Python and R!")
print(result)


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9950636029243469}]


######  Ex2

In [25]:
from transformers import pipeline

sentiment_classifier = pipeline("text-classification")

outputs = sentiment_classifier(""" Dear Seller, I got very impressed with the fast devlivery and careful packaging of my order""")

print(outputs[0]['label'])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


POSITIVE


### 3.3. Question Answering

In [22]:
from transformers import pipeline

# create a q-a piepeline
qa = pipeline('question-answering')

# giving some context
context = "Columbia University is a university located in New York"
question = 'Where is located Columbia University?'

# get the answer
result = qa(question=question,context=context)
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.9960016012191772, 'start': 47, 'end': 55, 'answer': 'New York'}


In [37]:
llm =  pipeline("question-answering")
context = "My name is Basilio and I like computational biology and machine learning"
question="What Basilio is talking about?"
outputs = llm(question=question,context=context)
print(outputs['answer'])

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


computational biology and machine learning


# 3.4. Text Summarization

In [36]:
llm = pipeline("summarization",model="facebook/bart-large-cnn")
long_text = "Alzheimer’s disease is highly heritable and characterized by amyloid plaques and tau tangles in the brain. The aim of this study was to investigate the association between genetic predisposition, Aβ misfolding in blood plasma, a unique marker of Alzheimer associated neuropathological changes, and Alzheimer’s disease occurrence within 14 years. Witin a German community-based cohort, two polygenic risk scores (clinical Alzheimer’s disease and Aβ42 based) were calculated, APOE genotype was determined, and Aβ misfolding in blood plasma was measured by immunoinfrared sensor in 59 participants diagnosed with Alzheimer’s disease during 14 years of follow-up and 581 participants without dementia diagnosis."
outputs = llm(long_text,max_length=60,clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

Alzheimer’s disease is highly heritable and characterized by amyloid plaques and tau tangles in the brain. The aim of this study was to investigate the association between genetic predisposition, Aβ misfolding in blood plasma, a unique marker of Alzheimer


# 3.5. Language translation

In [39]:
llm =pipeline("translation_en_to_es",model="Helsinki-NLP/opus-mt-en-es")
text = "My name is Basilio and I like computational biology and machine learning"
outputs = llm(text,clean_up_tokenization_spaces=True)
print(outputs[0]['translation_text'])

Me llamo Basilio y me gusta la biología computacional y el aprendizaje automático.


In [1]:
# 3.6. Replies to customers

In [4]:
# Create a pipeline for text generation using the gpt2 model
generator = pipeline('text-generation',model='gpt2',truncation=True)

text = "I had a wonderful stay at the Riverview Hotel! The staff were incredibly attentive and the amenities were top-notch. The only hiccup was a slight delay in room service, but that didn't overshadow the fantastic experience I had"
response = "Dear valued customer, I am glad to hear you had a good stay with us."

# Build the prompt for the text generation LLM
prompt = f"Customer review:\n{text}\n\nHotel reponse to the customer:\n{response}"

# Pass the prompt to the model pipeline
outputs = generator(prompt, max_length=150, pad_token_id=generator.tokenizer.eos_token_id)

# Print the augmented sequence generated by the model
print(outputs[0]['generated_text'])

Customer review:
I had a wonderful stay at the Riverview Hotel! The staff were incredibly attentive and the amenities were top-notch. The only hiccup was a slight delay in room service, but that didn't overshadow the fantastic experience I had

Hotel reponse to the customer:
Dear valued customer, I am glad to hear you had a good stay with us. My wife and I loved our stay on the Riverview, and I felt like it was the only time that we had met. The service was very professional, excellent quality of care, and the bar was extremely clean and friendly. And no matter what you order: we always get our drinks delivered. We hope that you return to Riverview for the
