In [6]:
paragraph="""Formula 1, often referred to as F1, is the highest class of international single-seater racing and is widely considered the pinnacle of motorsport due to its unmatched combination of speed, technology, and global spectacle. Governing bodies, teams, and engineers collaborate to push the boundaries of automotive innovation, developing cars capable of exceeding 350 km/h while maintaining precision through some of the most challenging corners and circuits in the world. Each season consists of a series of Grand Prix races held across various countries, turning the championship into a worldwide tour that showcases diverse cultures and racing traditions. The sport features 10 teams and 20 elite drivers who undergo rigorous physical and mental training to withstand intense G-force, split-second decision-making, and fierce competition. Strategy plays a crucial role, with pit stops, tire choices, aerodynamics, and real-time data analysis contributing to every race’s outcome. The roar of hybrid power units, the complexity of cutting-edge electronics, and the artistry of aerodynamic design make every car a marvel of modern engineering. Beyond the track, F1 has evolved into a massive global entertainment industry, attracting millions of fans, generating significant economic impact, and inspiring advancements in everyday car technology, including safety systems and fuel-efficiency innovations. Its long history, legendary drivers like Michael Schumacher, Ayrton Senna, and Lewis Hamilton, and iconic teams such as Ferrari, McLaren, and Mercedes have cemented Formula 1’s reputation as not just a sport, but a thrilling fusion of science, skill, passion, and speed."""

In [7]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Nitin\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [8]:
from nltk.stem import PorterStemmer


In [9]:
from nltk.corpus import stopwords
stopwords.words('english')

['a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 "he'd",
 "he'll",
 'her',
 'here',
 'hers',
 'herself',
 "he's",
 'him',
 'himself',
 'his',
 'how',
 'i',
 "i'd",
 'if',
 "i'll",
 "i'm",
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it'd",
 "it'll",
 "it's",
 'its',
 'itself',
 "i've",
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'on

In [10]:
stemmer=PorterStemmer()
sentences=nltk.sent_tokenize(paragraph)
sentences

['Formula 1, often referred to as F1, is the highest class of international single-seater racing and is widely considered the pinnacle of motorsport due to its unmatched combination of speed, technology, and global spectacle.',
 'Governing bodies, teams, and engineers collaborate to push the boundaries of automotive innovation, developing cars capable of exceeding 350 km/h while maintaining precision through some of the most challenging corners and circuits in the world.',
 'Each season consists of a series of Grand Prix races held across various countries, turning the championship into a worldwide tour that showcases diverse cultures and racing traditions.',
 'The sport features 10 teams and 20 elite drivers who undergo rigorous physical and mental training to withstand intense G-force, split-second decision-making, and fierce competition.',
 'Strategy plays a crucial role, with pit stops, tire choices, aerodynamics, and real-time data analysis contributing to every race’s outcome.',


In [11]:
# apply stopwords and then stemming

In [15]:
for i in range(len(sentences)):
    words=nltk.word_tokenize(sentences[i])
    words=[stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i]=' '.join(words)

In [17]:
sentences

['formula 1 , often refer f1 , highest class intern single-seat race wide consid pinnacl motorsport due unmatch combin speed , technolog , global spectacl .',
 'govern bodi , team , engin collabor push boundari automot innov , develop car capabl exceed 350 km/h maintain precis challeng corner circuit world .',
 'each season consist seri grand prix race held across variou countri , turn championship worldwid tour showcas divers cultur race tradit .',
 'the sport featur 10 team 20 elit driver undergo rigor physic mental train withstand intens g-forc , split-second decision-mak , fierc competit .',
 'strategi play crucial role , pit stop , tire choic , aerodynam , real-tim data analysi contribut everi race ’ outcom .',
 'the roar hybrid power unit , complex cutting-edg electron , artistri aerodynam design make everi car marvel modern engin .',
 'beyond track , f1 evolv massiv global entertain industri , attract million fan , gener signific econom impact , inspir advanc everyday car techno

## APPLYING LEMMATIZATION AND REMOVING STOPWORDS

In [23]:
from nltk.stem import WordNetLemmatizer
lemmatizer=WordNetLemmatizer()
sentences=nltk.sent_tokenize(paragraph)
for i in range(len(sentences)):
    words=nltk.word_tokenize(sentences[i])
    words=[lemmatizer.lemmatize(word,pos='v') for word in words if word not in set(stopwords.words('english'))]
    sentences[i]=' '.join(words)

In [24]:
sentences

['Formula 1 , often refer F1 , highest class international single-seater race widely consider pinnacle motorsport due unmatched combination speed , technology , global spectacle .',
 'Governing body , team , engineer collaborate push boundaries automotive innovation , develop cars capable exceed 350 km/h maintain precision challenge corner circuit world .',
 'Each season consist series Grand Prix race hold across various countries , turn championship worldwide tour showcases diverse culture race traditions .',
 'The sport feature 10 team 20 elite drivers undergo rigorous physical mental train withstand intense G-force , split-second decision-making , fierce competition .',
 'Strategy play crucial role , pit stop , tire choices , aerodynamics , real-time data analysis contribute every race ’ outcome .',
 'The roar hybrid power units , complexity cutting-edge electronics , artistry aerodynamic design make every car marvel modern engineer .',
 'Beyond track , F1 evolve massive global ente