In [23]:
!pip install transformers



In [24]:
from transformers import pipeline, set_seed
import warnings
warnings.filterwarnings("ignore")

#Text Generation

In [25]:
generator = pipeline('text-generation', model='gpt2')
set_seed(42)

In [26]:
generator("Hello, I like to play cricket,", max_length=60, num_return_sequences=7)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hello, I like to play cricket, but what I do in that sport is different—I've been here for over 12 years, and I've heard what people were saying: 'Well, your father should have been playing cricket in this country by now'. The first thing to know is that people"},
 {'generated_text': 'Hello, I like to play cricket, sometimes, but they\'re the same rules," said his nephew, who has been playing cricket for many years and has not been seen on television.\n\n\'One bad cricket match might not be enough\'\n\nA year ago the retired ex-colleg'},
 {'generated_text': "Hello, I like to play cricket, but I've never been in a place that can get me started as a bat pro, so that has to be my goal.\n\nWith the Ashes there is just too much talk about the World Cup qualifiers and how they don't get any more important."},
 {'generated_text': 'Hello, I like to play cricket, not cricket."\n\nAnd you\'re not the smartest.\n\n"Yeah, yeah, yeah, yeah. Actually, I\'m pretty good at cricket

In [27]:
generator("Natural Language Processing is evolving technology", max_length=10, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Natural Language Processing is evolving technology that is capable of'},
 {'generated_text': 'Natural Language Processing is evolving technology (through new models'},
 {'generated_text': 'Natural Language Processing is evolving technology, with new opportunities'},
 {'generated_text': 'Natural Language Processing is evolving technology to bring the entire'},
 {'generated_text': 'Natural Language Processing is evolving technology that will enable humans'}]

#Question Answering

In [28]:
question_answerer = pipeline('question-answering')

question_answerer({
    'question': 'What is the Newtons third law of motion?',
    'context': 'Newton’s third law of motion states that, "For every action there is equal and opposite reaction"'})

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.6062517762184143,
 'start': 42,
 'end': 97,
 'answer': '"For every action there is equal and opposite reaction"'}

In [29]:
nlp = pipeline("question-answering")

context = r"""
Micorsoft was founded by Bill gates and Paul allen in the year 1975.
The property of being prime (or not) is called primality.
A simple but slow method of verifying the primality of a given number n is known as trial division.
It consists of testing whether n is a multiple of any integer between 2 and itself.
Algorithms much more efficient than trial division have been devised to test the primality of large numbers.
These include the Miller–Rabin primality test, which is fast but has a small probability of error, and the AKS primality test, which always produces the correct answer in polynomial time but is too slow to be practical.
Particularly fast methods are available for numbers of special forms, such as Mersenne numbers.
As of January 2016, the largest known prime number has 22,338,618 decimal digits.
"""

result = nlp(question="What is a simple method to verify primality?", context=context)

print(f"Answer 1: '{result['answer']}'")

result = nlp(question="When did Bill gates founded Microsoft?", context=context)

print(f"Answer 2: '{result['answer']}'")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Answer 1: 'trial division'
Answer 2: '1975'


#Machine Translation

In [30]:
# English to German
translator_ger = pipeline("translation_en_to_de")
print("German: ",translator_ger("Joe Biden became the 46th president of U.S.A.", max_length=40)[0]['translation_text'])

No model was supplied, defaulted to google-t5/t5-base and revision 686f1db (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


German:  Joe Biden wurde der 46. Präsident der USA.


In [31]:
translator_fr = pipeline('translation_en_to_fr')
print("French: ",translator_fr("Joe Biden became the 46th president of U.S.A",  max_length=40)[0]['translation_text'])

No model was supplied, defaulted to google-t5/t5-base and revision 686f1db (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


French:  Joe Biden est devenu le 46e président des États-Unis


#Text-Summerization

In [33]:
#using nltk
import nltk
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance
import numpy as np

nltk.download('punkt')
nltk.download('stopwords')

def read_article(text):
    sentences = nltk.sent_tokenize(text)
    return sentences

def sentence_similarity(sent1, sent2, stop_words):
    words1 = nltk.word_tokenize(sent1)
    words2 = nltk.word_tokenize(sent2)

    words1 = [word.lower() for word in words1 if word.isalnum() and word.lower() not in stop_words]
    words2 = [word.lower() for word in words2 if word.isalnum() and word.lower() not in stop_words]

    all_words = list(set(words1 + words2))

    vector1 = [1 if word in words1 else 0 for word in all_words]
    vector2 = [1 if word in words2 else 0 for word in all_words]

    return 1 - cosine_distance(vector1, vector2)

def build_similarity_matrix(sentences, stop_words):
    similarity_matrix = np.zeros((len(sentences), len(sentences)))

    for i in range(len(sentences)):
        for j in range(len(sentences)):
            if i != j:
                similarity_matrix[i][j] = sentence_similarity(sentences[i], sentences[j], stop_words)

    return similarity_matrix

def generate_summary(text, num_sentences=5):
    stop_words = set(stopwords.words("english"))

    sentences = read_article(text)

    similarity_matrix = build_similarity_matrix(sentences, stop_words)

    sentence_scores = np.array(similarity_matrix.sum(axis=1))

    ranked_sentences = np.argsort(-sentence_scores)

    summary_sentences = [sentences[i] for i in ranked_sentences[:num_sentences]]
    summary = ' '.join(summary_sentences)

    return summary

input_text = """
Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans using natural language. It involves the development of algorithms and models that enable computers to understand, interpret, and generate human-like text. Text summarization is one of the applications of NLP, where the goal is to generate a concise and coherent summary of a given document or piece of text.

There are two main approaches to text summarization: extractive and abstractive. Extractive summarization involves selecting and combining important sentences or phrases from the original text to create a summary. Abstractive summarization, on the other hand, involves generating new sentences that capture the key information from the original text.

In this example, we will focus on extractive summarization using NLTK, a popular NLP library in Python. We will use a simple algorithm that calculates the cosine similarity between sentences to identify the most important ones for the summary.

Let's try generating a summary for this sample text using our program.
"""

summary = generate_summary(input_text)
print(summary)


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Win11\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Win11\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Extractive summarization involves selecting and combining important sentences or phrases from the original text to create a summary. Abstractive summarization, on the other hand, involves generating new sentences that capture the key information from the original text. There are two main approaches to text summarization: extractive and abstractive. Text summarization is one of the applications of NLP, where the goal is to generate a concise and coherent summary of a given document or piece of text. Let's try generating a summary for this sample text using our program.


In [32]:
#using spacy
!pip install --upgrade numpy

!pip uninstall blis spacy -y
!pip install blis spacy
!python -m spacy download en_core_web_sm


import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Load spaCy model
nlp = spacy.load('en_core_web_sm')

# Function to preprocess text
def preprocess_text(text):
    doc = nlp(text)
    tokens = [token.text for token in doc if not token.is_stop and not token.is_punct]
    return ' '.join(tokens)

# Generate summary function
def generate_summary(text, num_sentences=5):
    # Split the text into sentences
    sentences = [sent.text for sent in nlp(text).sents]
    
    # Preprocess each sentence
    preprocessed_sentences = [preprocess_text(sentence) for sentence in sentences]
    
    # Vectorize the sentences
    vectorizer = CountVectorizer().fit_transform(preprocessed_sentences)
    
    # Calculate cosine similarity matrix
    similarity_matrix = cosine_similarity(vectorizer, vectorizer)
    
    # Score each sentence based on its similarity to other sentences
    sentence_scores = similarity_matrix.sum(axis=1)
    
    # Rank sentences based on their score
    ranked_sentences = np.argsort(sentence_scores)[::-1]
    
    # Select top N sentences
    summary_sentences = [sentences[i] for i in ranked_sentences[:num_sentences]]
    
    # Join the selected sentences to form the final summary
    summary = ' '.join(summary_sentences)
    return summary

# Sample input text
input_text = """
Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans using natural language. It involves the development of algorithms and models that enable computers to understand, interpret, and generate human-like text. Text summarization is one of the applications of NLP, where the goal is to generate a concise and coherent summary of a given document or piece of text.

There are two main approaches to text summarization: extractive and abstractive. Extractive summarization involves selecting and combining important sentences or phrases from the original text to create a summary. Abstractive summarization, on the other hand, involves generating new sentences that capture the key information from the original text.

In this example, we will focus on extractive summarization using spaCy, a popular NLP library in Python. We will use a simple algorithm that calculates the cosine similarity between sentences to identify the most important ones for the summary.

Let's try generating a summary for this sample text using our program.
"""

# Generate and print the summary
summary = generate_summary(input_text)
print(summary)



ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.0 requires FuzzyTM>=0.4.0, which is not installed.
astropy 5.3.4 requires numpy<2,>=1.21, but you have numpy 2.1.2 which is incompatible.
contourpy 1.2.0 requires numpy<2.0,>=1.20, but you have numpy 2.1.2 which is incompatible.
matplotlib 3.8.0 requires numpy<2,>=1.21, but you have numpy 2.1.2 which is incompatible.
numba 0.59.0 requires numpy<1.27,>=1.22, but you have numpy 2.1.2 which is incompatible.
pandas 2.1.4 requires numpy<2,>=1.23.2; python_version == "3.11", but you have numpy 2.1.2 which is incompatible.
pywavelets 1.5.0 requires numpy<2.0,>=1.22.4, but you have numpy 2.1.2 which is incompatible.
scipy 1.11.4 requires numpy<1.28.0,>=1.21.6, but you have numpy 2.1.2 which is incompatible.
streamlit 1.30.0 requires numpy<2,>=1.19.3, but you have numpy 2.1.2 which is incompatible.
tensorflow-in


Collecting numpy
  Using cached numpy-2.1.2-cp311-cp311-win_amd64.whl.metadata (59 kB)
Using cached numpy-2.1.2-cp311-cp311-win_amd64.whl (12.9 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.0.2
    Uninstalling numpy-2.0.2:
      Successfully uninstalled numpy-2.0.2
Successfully installed numpy-2.1.2
Found existing installation: blis 1.0.1
Uninstalling blis-1.0.1:
  Successfully uninstalled blis-1.0.1
Found existing installation: spacy 3.8.2
Uninstalling spacy-3.8.2:
  Successfully uninstalled spacy-3.8.2
Collecting blis
  Using cached blis-1.0.1-cp311-cp311-win_amd64.whl.metadata (7.8 kB)
Collecting spacy
  Using cached spacy-3.8.2-cp311-cp311-win_amd64.whl.metadata (27 kB)
Collecting numpy<3.0.0,>=2.0.0 (from blis)
  Using cached numpy-2.0.2-cp311-cp311-win_amd64.whl.metadata (59 kB)
Using cached blis-1.0.1-cp311-cp311-win_amd64.whl (6.3 MB)
Using cached spacy-3.8.2-cp311-cp311-win_amd64.whl (12.2 MB)
Using cached num

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.0 requires FuzzyTM>=0.4.0, which is not installed.
astropy 5.3.4 requires numpy<2,>=1.21, but you have numpy 2.0.2 which is incompatible.
contourpy 1.2.0 requires numpy<2.0,>=1.20, but you have numpy 2.0.2 which is incompatible.
matplotlib 3.8.0 requires numpy<2,>=1.21, but you have numpy 2.0.2 which is incompatible.
numba 0.59.0 requires numpy<1.27,>=1.22, but you have numpy 2.0.2 which is incompatible.
pandas 2.1.4 requires numpy<2,>=1.23.2; python_version == "3.11", but you have numpy 2.0.2 which is incompatible.
pywavelets 1.5.0 requires numpy<2.0,>=1.22.4, but you have numpy 2.0.2 which is incompatible.
scipy 1.11.4 requires numpy<1.28.0,>=1.21.6, but you have numpy 2.0.2 which is incompatible.
streamlit 1.30.0 requires numpy<2,>=1.19.3, but you have numpy 2.0.2 which is incompatible.
tensorflow-in

ValueError: BLIS support requires blis: pip install blis