### 1 example with stemming

In [1]:
import re
import nltk
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords

In [2]:
para = """in the depths of the human psyche lies an unquenchable thirst for knowledge, a relentless pursuit of understanding the world and our place within it. The frontiers of science, once distant and mysterious, now beckon us with open arms, inviting us to unravel the enigmas of the cosmos and the intricacies of the human mind. From deciphering the fundamental building blocks of matter to exploring the depths of quantum mechanics, the essence of reality unveils itself, inspiring wonder and awe in the hearts of those who dare to delve deeper.

The realms of medicine and biotechnology have witnessed unprecedented breakthroughs, pushing the boundaries of human health and longevity. Gene-editing technologies hold the promise of eradicating hereditary diseases, while regenerative medicine sparks hope for the regeneration of organs and tissues. In the face of pandemics and global health challenges, our resilience shines as scientists collaborate across borders to develop vaccines and treatments, exemplifying the triumph of human cooperation and the power of collective knowledge.

As the digital age progresses, the fusion of physical and virtual realities blurs the lines between the tangible and the intangible. Virtual and augmented reality technologies immerse us in worlds of limitless potential, revolutionizing industries from entertainment and education to healthcare and design. The advent of blockchain technology has sparked a decentralized revolution, reshaping finance, governance, and identity management, empowering individuals with newfound autonomy over their digital lives.

However, with these advancements come ethical quandaries that demand profound introspection. The rise of automation and artificial intelligence raises concerns about job displacement and the future of work. Striking the delicate balance between progress and preserving human dignity challenges society to redefine its values and the nature of meaningful existence.

In the tapestry of culture and arts, humanity weaves its narratives, conveying the essence of our shared human experience. Literature, music, and visual arts transcend language and borders, connecting souls and sparking introspection. Cultural exchange enriches our understanding of one another, promoting tolerance, and celebrating diversity as a vibrant tapestry of stories, beliefs, and customs.

As we continue to navigate this ever-changing landscape, the need for moral compasses and wise leadership becomes ever more apparent. Global challenges, such as climate change, mass migration, and social inequality, demand collective action and unity. It is through compassionate collaboration that we can mend the frayed threads of society, bridging divides and empowering marginalized communities, ensuring that no voice is lost in the symphony of humanity.

"""

In [3]:
ps = PorterStemmer()
lem = WordNetLemmatizer()

In [4]:
sent = nltk.sent_tokenize(para)

In [5]:
len(sent)

18

In [6]:
sent

['in the depths of the human psyche lies an unquenchable thirst for knowledge, a relentless pursuit of understanding the world and our place within it.',
 'The frontiers of science, once distant and mysterious, now beckon us with open arms, inviting us to unravel the enigmas of the cosmos and the intricacies of the human mind.',
 'From deciphering the fundamental building blocks of matter to exploring the depths of quantum mechanics, the essence of reality unveils itself, inspiring wonder and awe in the hearts of those who dare to delve deeper.',
 'The realms of medicine and biotechnology have witnessed unprecedented breakthroughs, pushing the boundaries of human health and longevity.',
 'Gene-editing technologies hold the promise of eradicating hereditary diseases, while regenerative medicine sparks hope for the regeneration of organs and tissues.',
 'In the face of pandemics and global health challenges, our resilience shines as scientists collaborate across borders to develop vaccin

In [8]:
corpus = []
for i in range(len(sent)):
    review = re.sub("[^a-zA-Z]", " ", sent[i])
    review = review.lower()
    review = review.split()
    review = [ps.stem(word) for word in review if word not in set(stopwords.words("English"))]
    review = " ".join(review)
    corpus.append(review)

In [9]:
corpus

['depth human psych lie unquench thirst knowledg relentless pursuit understand world place within',
 'frontier scienc distant mysteri beckon us open arm invit us unravel enigma cosmo intricaci human mind',
 'deciph fundament build block matter explor depth quantum mechan essenc realiti unveil inspir wonder awe heart dare delv deeper',
 'realm medicin biotechnolog wit unpreced breakthrough push boundari human health longev',
 'gene edit technolog hold promis erad hereditari diseas regen medicin spark hope regener organ tissu',
 'face pandem global health challeng resili shine scientist collabor across border develop vaccin treatment exemplifi triumph human cooper power collect knowledg',
 'digit age progress fusion physic virtual realiti blur line tangibl intang',
 'virtual augment realiti technolog immers us world limitless potenti revolution industri entertain educ healthcar design',
 'advent blockchain technolog spark decentr revolut reshap financ govern ident manag empow individu ne

In [10]:
from sklearn.feature_extraction.text import CountVectorizer

In [11]:
cv = CountVectorizer()

In [13]:
# fit_transform is responsible for creating matrix
x = cv.fit_transform(corpus).toarray()

In [14]:
x 

array([[0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 1, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 1, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

### 2 example with Lemmitization

In [31]:
para2 = """Today, I stand before you to celebrate the life and legacy of one of the most remarkable individuals in modern history – Dr. APJ Abdul Kalam. A visionary, a scientist, and a leader, he was affectionately known as the "Missile Man of India" and, more importantly, as the "People's President."

Born in a humble family in Rameswaram, Tamil Nadu, Abdul Kalam's journey from a small town to becoming one of the most revered figures globally is a testament to the power of hard work, perseverance, and unwavering dedication to a cause. His insatiable thirst for knowledge and passion for science led him to become a prominent aerospace engineer and an instrumental figure in India's missile and space programs.

But beyond his scientific achievements, what truly set Dr. Kalam apart was his humility, his genuine love for people, and his unwavering belief in the youth of the nation. He believed that every individual, regardless of their background, had the potential to achieve greatness, and he dedicated much of his life to inspiring and nurturing young minds.

As the 11th President of India, he embraced the role with humility and grace, becoming an inspiring symbol of unity and progress for the nation. He sought to connect with the youth, encouraging them to dream big, work hard, and contribute to the development of the country. He traveled tirelessly, engaging with students and encouraging them to pursue careers in science and technology, fostering a culture of innovation and excellence.

Dr. Kalam's speeches were filled with wisdom, hope, and the vision of a better India. He often emphasized the importance of values, integrity, and social responsibility, urging the youth to be the change they wished to see in the world. His words resonated deeply with people of all ages, transcending boundaries of region and religion, and leaving an indelible impact on hearts and minds.

Despite all his accomplishments, Abdul Kalam remained a simple and approachable man. He carried himself with grace, warmth, and humility, making everyone feel valued and respected in his presence. His life was a living example of how greatness is not defined by titles or accolades but by the impact we have on the lives of others.

Tragically, on July 27, 2015, India and the world lost a true visionary, but Abdul Kalam's spirit lives on, inspiring generations to come. As we reflect on his life, let us carry forward his vision of a united, prosperous, and technologically advanced India. Let us continue to nurture and empower the youth, for they are the torchbearers of our nation's future.

In memory of Dr. APJ Abdul Kalam, let us strive to embody his values of integrity, compassion, and dedication to the betterment of society. Let us remember that each one of us has the power to make a positive impact, no matter how big or small, and let us work together to build a nation that he would have been proud of.

Thank you."""

In [32]:
sent2 = nltk.sent_tokenize(para2)

In [37]:
len(sent2)

21

In [38]:
import re
from nltk.stem import WordNetLemmatizer

In [39]:
tem = WordNetLemmatizer()

In [50]:
corpus2 = []
for i in range(len(sent2)):
    review2 = re.sub("[^a-zA-Z]", " " , sent2[i])
    review2 = review2.lower()
    review2 = review2.split()
    review2 = [tem.lemmatize(word) for word in review2 if word not in set(stopwords.words("English"))]
    review2 = " ".join(review2)
    corpus2.append(review2)

In [51]:
corpus2

['today stand celebrate life legacy one remarkable individual modern history dr apj abdul kalam',
 'visionary scientist leader affectionately known missile man india importantly people president',
 'born humble family rameswaram tamil nadu abdul kalam journey small town becoming one revered figure globally testament power hard work perseverance unwavering dedication cause',
 'insatiable thirst knowledge passion science led become prominent aerospace engineer instrumental figure india missile space program',
 'beyond scientific achievement truly set dr kalam apart humility genuine love people unwavering belief youth nation',
 'believed every individual regardless background potential achieve greatness dedicated much life inspiring nurturing young mind',
 'th president india embraced role humility grace becoming inspiring symbol unity progress nation',
 'sought connect youth encouraging dream big work hard contribute development country',
 'traveled tirelessly engaging student encouragin

In [52]:
from sklearn.feature_extraction.text import CountVectorizer

In [54]:
cv2 = CountVectorizer()

In [55]:
x2 = cv2.fit_transform(corpus2).toarray()

In [56]:
x2

array([[1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ...,
       [1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 1, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)