<a href="https://colab.research.google.com/github/jonbaer/googlecolab/blob/master/BasicLingua_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BasicLingua Library for NLP

Created By
* [Fareed Khan](https://github.com/FareedKhan-dev)
* [Asad Rizvi](https://github.com/Asad-94)


GitHub Repository: [BasicLingua](https://github.com/FareedKhan-dev/basic_lingua)


In [None]:
!pip install basiclingua

In [None]:
# Importing Module
from basiclingua import BasicLingua
client = BasicLingua(api_key="YOUR_GEMINI_API_KEY")

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Pattern Extraction

user_input = '''The phone number of fareed khan and asad rizvi are 123-456-7890 and 523-456-7892. Please call for assistance and email me at x123@gmail.com'''
patterns = '''email, phone number, name'''

extracted_patterns = client.extract_patterns(user_input, patterns)

print(extracted_patterns)

['123-456-7890', '523-456-7892', 'fareed khan', 'asad rizvi', 'x123@gmail.com']


In [None]:
# Text Translation

user_input = '''The phone number of fareed khan and asad rizvi are 123-456-7890 and 523-456-7892. Please call for assistance and email me at x123@gmail.com'''
target_lang = "korean"

translated_text = client.text_translate(user_input, target_lang)

print(translated_text)

파리드칸과 아사드 리즈비의 전화번호는 123-456-7890과 523-456-7892입니다. 도움이 필요하시면 전화 주시거나 x123@gmail.com으로 이메일 보내 주시기 바랍니다.


In [None]:
# Text Replacement

user_input = '''I love Lamborghini, but Bugatti is even better. Although, Mercedes is a class above all.'''
replacement_rules = '''all mentioned cars with mehran but mercerdes with toyota'''

answer = client.text_replace(user_input, replacement_rules)
print(answer)

I love mehran, but mehran is even better. Although, toyota is a class above all.


In [None]:
# Named Entity Recognition

user_input = '''I love Lamborghini, but Bugatti is even better. Although, Mercedes is a class above all. and I work in Google'''

answer = client.detect_ner(user_input, ner_tags="cars, date, time")
print(answer)

[('Lamborghini', 'cars'), ('Bugatti', 'cars'), ('Mercedes', 'cars'), ('Google', 'organization')]


In [None]:
# Text Summarization

user_input = '''I love Lamborghini, but Bugatti is even better. Although, Mercedes is a class above all. and I work in Google'''
summary_length = 'medium' # short, medium, long

summary = client.text_summarize(user_input, summary_length)

print(summary)

The given text expresses a preference for Bugatti over Lamborghini, and Mercedes over both. It also mentions employment at Google. These preferences and the employment information constitute the main ideas of the text.


In [None]:
# Text Question Answering

user_input = '''OpenAI has hosted a hackathon for developers to build AI models. The event took place on 15th October 2022. The event was a huge success with over 1000 participants from around the world.'''
question = "When did the event happen?"

answer = client.text_qna(user_input, question)
print(answer)

15th October 2022


In [None]:
# Text Intent Detection

user_input = '''let's book a flight for our vacation and reserve a table at a restaurant for dinner. also going to watch football match at 8 pm.'''
intent = client.text_intent(user_input)
print(intent)

['Book Flight', 'Reserve Restaurant', 'Watch Football Match']


In [None]:
# Text Lemmatization/Stemming

user_input = '''OpenAI has hosted a hackathon for developers to build AI models. The event took place on 15th October 2022. The event was a huge success with over 1000 participants from around the world.'''

answer = client.text_lemstem(user_input, task_type="stemming")
print(answer)

OpenAI host a hackathon for developer to build AI model. The event take place on 15th October 2022. The event be a huge success with over 1000 participant from around the world.


In [None]:
# Text Tokenization

user_input = '''OpenAI has hosted a hackathon for developers to build AI models. The event took place on 15th October 2022. The event was a huge success with over 1000 participants from around the world.'''

answer = client.text_tokenize(user_input, ".")
print(answer)

['OpenAI has hosted a hackathon for developers to build AI models', ' The event took place on 15th October 2022', ' The event was a huge success with over 1000 participants from around the world', '']


In [None]:
# Text Embedding

user_input = '''OpenAI has hosted a hackathon for developers to build AI models. The event took place on 15th October 2022. The event was a huge success with over 1000 participants from around the world.'''
task_type = "RETRIEVAL_QUERY" # "RETRIEVAL_QUERY", "RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING"

answer = client.text_embedd(user_input, task_type)
print(answer[:10])

[-0.04192694, -0.05051928, -0.034939777, 0.025280714, 0.012726914, 0.026818482, 0.068937935, 0.014136571, 0.08999535, -0.012914751]


In [None]:
# Text Generation

user_input = '''Generate a poem of wolf chasing the moon.'''
ans_length = 'short' # short, medium, long

answer = client.text_generate(user_input, ans_length)
print(answer)

In the ethereal realm, a wolf on the prowl,
Its gleaming eyes fixed on the celestial jewel,
The moon, a shimmering beacon in the sky,
A chase unfolds, a dance that fills the night.


In [None]:
# Text Spam Detection

user_input = '''Congratulations! You have won a lottery of $1,000,000!'''
num_classes = "harmed, not_harmed, unknown"

answer = client.detect_spam(user_input, num_classes, explanation=False)
print(answer)

{'prediction': 'harmed'}


In [None]:
# Text Cleaning

user_input = '''<h1>Heading</h1> <a>para</a> visit to this website https://www.google.com for more information about the product. and you can find the product at this address 1234'''
clean_info = '''remove a tags but keep their inner text
'''

answer = client.text_clean(user_input, clean_info)
print(answer)

<h1>Heading</h1> para visit to this website https://www.google.com for more information about the product. and you can find the product at this address 1234


In [None]:
# Text Normalization

user_input = "This is a SAMPLE string to be transformed."
mode = "lowercase"  # "uppercase" or "lowercase"

transformed_string = client.text_normalize(user_input, mode)

print(transformed_string)

this is a sample string to be transformed.


In [None]:
# Text Spellcheck

user_input = '''we wlli oderr pzzia adn buregsr at nghti'''
corrected_text = client.text_spellcheck(user_input)
print(corrected_text)

we will order pizza and burgers at night


In [None]:
user_input = '''John ate the delicious pizza with gusto.'''
srl_result = client.text_srl(user_input)
print(srl_result)

{'Predicate': 'ate', 'Agent': 'John', 'Theme': 'the delicious pizza'}


In [None]:
# Text Clustering

user_input = '''
"The company reported record profits for the third quarter.", "The latest fashion trends for spring and summer are unveiled.",
"Profits soared in the third quarter, reaching unprecedented levels.", "Tips for improving productivity in the workplace."
'''
clusters = client.text_cluster(user_input)
print(clusters)

{0: ['"The company reported record profits for the third quarter."', '"Profits soared in the third quarter, reaching unprecedented levels."'], 1: ['"The latest fashion trends for spring and summer are unveiled."'], 2: ['"Tips for improving productivity in the workplace."']}


In [None]:
# Text Sentiment Analysis

user_input = '''Congratulations! You have won a lottery of $1,000,000!'''
num_classes = "very positive, positive, neutral, negative, very negative"

answer = client.text_sentiment(user_input, num_classes, explanation=True)
print(answer)

{'prediction': 'very positive', 'explanation': 'The text expresses excitement and a positive feeling, indicating a very positive sentiment.'}


In [None]:
# Text Topic Classification

user_input = '''a ghost is chasing me in the dark forest. I am scared and running for my life. I hope I can make it out alive.'''
num_classes = "story, horror, comedy"

answer = client.text_topic(user_input, num_classes, explanation=True)
print(answer)

{'prediction': 'horror', 'explanation': 'The text is about a ghost chasing the speaker in a dark forest, which is a common theme in horror stories. The speaker is also scared and running for their life, which adds to the sense of fear and suspense.'}


In [None]:
# Text Part of Speech Tagging

user_input = '''I love Lamborghini, but Bugatti is even better. Although, Mercedes is a class above all.'''

answer = client.detect_pos(user_input)

answer

[('I', 'pronoun'),
 ('love', 'verb'),
 ('Lamborghini', 'noun'),
 (',', 'punctuation'),
 ('but', 'conjunction'),
 ('Bugatti', 'noun'),
 ('is', 'verb'),
 ('even', 'adverb'),
 ('better', 'adjective'),
 ('Although', 'conjunction'),
 ('Mercedes', 'noun'),
 ('is', 'verb'),
 ('a', 'determiner'),
 ('class', 'noun'),
 ('above', 'preposition'),
 ('all', 'pronoun')]

In [None]:
# Text Paraphrasing

user_input = ["Google has updated their search engine.", "The search engine has been updated by Google."]

answer = client.text_paraphrase(user_input, explanation=True)
print(answer)

{'prediction': 'yes', 'explanation': 'The two sentences have the same meaning but use different sentence structures. Sentence 1 is in the active voice, while Sentence 2 is in the passive voice. The subject of Sentence 1 (Google) is the object of Sentence 2, and the object of Sentence 1 (the search engine) is the subject of Sentence 2.'}


In [None]:
# Text OCR

image_path = "image.jpg"
prompt = "whose fees voucher is this?"

extracted_text = client.text_ocr(image_path, prompt)
print(extracted_text)

 This is Fareed Hassan Khan's fee voucher.


In [None]:
# Text Segmentation

user_input = '''The sun gently rose above the horizon, painting the sky with hues of pink, orange, red etc. like rainbow. Birds greeted the dawn with melodious songs, i.e. their chirping filling the air with a sense of serenity. Dew glistened on the grass, sparkling like diamonds in the morning light. A cool breeze whispered through the trees, carrying the scent of blooming flowers.'''

sentences = client.text_segment(user_input, logical=True)
print(sentences)

['The sun gently rose above the horizon, painting the sky with hues of pink, orange, red etc. like rainbow.', 'Birds greeted the dawn with melodious songs, i.e. their chirping filling the air with a sense of serenity.', 'Dew glistened on the grass, sparkling like diamonds in the morning light.', 'A cool breeze whispered through the trees, carrying the scent of blooming flowers.']


In [None]:
# Text Emotion Detection

user_input = "😂😂😂 how can that be possible?"
expanded_user_input = client.text_emojis(user_input)
print(expanded_user_input)

Laughing out loud, how can that be possible?


In [None]:
# Text TF-IDF

documents_list = [
    "The quick brown fox jumps over the lazy dog.",
    "The quick brown fox jumps over the lazy dog and the cat.",
    "The quick brown fox jumps over the lazy dog and the cat in the park."
]

ngrams_size = 2  # Size of n-grams

output_type = 'all'  # 'tfidf', 'ngrams', 'all'

tfidf_matrix = client.text_tfidf(documents_list, ngrams_size, output_type)

print(type(tfidf_matrix))

<class 'tuple'>


In [None]:
# Text Idioms

user_input = "We need to be on the same page and not throw in the towel at the first sign of trouble."
extracted_idioms = client.text_idioms(user_input)
print(extracted_idioms)

['be on the same page', 'throw in the towel']


In [None]:
# Text sense disambiguation

user_input = '''The baseball player swung the bat with all his strength. A bat flew overhead as they walked through the woods at dusk.'''
word_to_disambiguate = 'bat'

meanings = client.text_sense_disambiguation(user_input, word_to_disambiguate)
print(meanings)

['Word to Disambiguate: bat', 'Meaning1: The wooden implement used in baseball games (in the first sentence)', 'Meaning2: Nocturnal flying mammal (in the second sentence)']


In [None]:
# Text Word Frequency

user_input = "The quick brown fox jumps over the lazy dog and black fox."
words = ["fox", "quick", "dog"]  # Specific words to calculate frequency for, or None for all words

frequency = client.text_word_frequency(user_input, words)
print(frequency)

{'fox': 2, 'quick': 1, 'dog': 1}


In [None]:
# Text Anomaly Detection

user_input = '''While hiking, I stumbled upon a cave filled with glowing mushrooms and a singing bear.'''
anomalies = client.text_anomaly(user_input)
print(anomalies)

['singing bear']


In [None]:
# Text Coreference Resolution

user_input = '''Emily and her brother James decided to go on a camping trip together. Teams were formed and they set out to explore the forest.'''

resolved_coreferences = client.text_coreference(user_input)
print(resolved_coreferences)

['her: Emily', 'brother: James', 'they: Emily and James']


In [None]:
# Text Badness

non_sarcastic_user_input = "I love driving on empty roads with no traffic."

analysis_type = "sarcasm"  # "profanity", "bias", "sarcasm"
threshold = "BLOCK_NONE"  # "BLOCK_NONE", "BLOCK_ONLY_HIGH", "BLOCK_MEDIUM_AND_ABOVE", "BLOCK_LOW_AND_ABOVE"

contains_language = client.text_badness(non_sarcastic_user_input, analysis_type, threshold)
print(contains_language, type(contains_language))

False <class 'bool'>
