### Example - Translator - English to Tamil , Hindi

In [17]:
import requests

def translate_english_to_tamil(text):
    try:
        url = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=en&tl=ta&dt=t&q=" + text
        response = requests.get(url)
        result = response.json()
        return result[0][0][0]
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

def translate_english_to_hindi(text):
    try:
        url = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=en&tl=hi&dt=t&q=" + text
        response = requests.get(url)
        result = response.json()
        return result[0][0][0]
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

# Example usage
english_text = "How are you?"
tamil_translation = translate_english_to_tamil(english_text)
hindi_translation = translate_english_to_hindi(english_text)
print("Tamil Translation:", tamil_translation)
print("Hindi Translation:", hindi_translation)


Tamil Translation: எப்படி இருக்கிறீர்கள்?
Hindi Translation: आप कैसे हैं?


The program uses the Google Translate API to translate text from English to Tamil and Hindi. Here's a step-by-step explanation of how it works:

### 1. **Imports the Required Library**
   - The program imports the `requests` library, which allows it to send HTTP requests to the Google Translate API.

### 2. **Function Definitions**
   - **`translate_english_to_tamil(text)`**:
     - This function translates English text to Tamil.
   - **`translate_english_to_hindi(text)`**:
     - This function translates English text to Hindi.

### 3. **How Each Function Works**
   - **URL Construction**:
     - The URL used in each function is constructed based on the Google Translate API endpoint.
     - For Tamil translation:
       ```python
       url = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=en&tl=ta&dt=t&q=" + text
       ```
       - `client=gtx`: Specifies the client type.
       - `sl=en`: Source language is English (`en`).
       - `tl=ta`: Target language is Tamil (`ta`).
       - `dt=t`: Specifies the data type (translation).
       - `q=`: Followed by the text to be translated.

     - For Hindi translation, the target language (`tl`) is changed to Hindi (`hi`).

   - **Sending a Request**:
     - The program sends a GET request to the constructed URL using `requests.get(url)`.

   - **Parsing the Response**:
     - The response from the API is in JSON format.
     - The program extracts the translated text using `response.json()` and then accesses the specific part of the JSON that contains the translation:
       ```python
       result[0][0][0]
       ```
       - This nested list structure holds the translated text in the first element.

   - **Handling Exceptions**:
     - The `try-except` block catches any errors that might occur during the request or parsing process and prints an error message.

### 4. **Example Usage**
   - The `english_text` variable is set to `"siva"`.
   - The program calls both translation functions:
     - `translate_english_to_tamil(english_text)` translates `"siva"` to Tamil.
     - `translate_english_to_hindi(english_text)` translates `"siva"` to Hindi.

### 5. **Output**
   - The program prints the translations:
     ```plaintext
     Tamil Translation: சிவா
     Hindi Translation: शिवा
     ```

### Summary
- The program sends a request to the Google Translate API for each translation.
- It retrieves and prints the translated text in Tamil and Hindi by accessing the relevant parts of the JSON response.

Yes, Natural Language Processing (NLP) plays a significant role in how the translation process works behind the scenes, even though the specific implementation of NLP is abstracted away from the user in this program.

### How NLP is Involved in This Process

1. **Language Detection**:
   - Although the source language (`sl=en`) is explicitly specified in this program, in many cases, translation systems automatically detect the source language. NLP techniques are used to analyze the input text and determine its language based on patterns, vocabulary, and syntax.

2. **Tokenization**:
   - NLP breaks down the input text into smaller units like words or phrases (tokens). This is crucial for understanding the context and meaning of each part of the text during translation.

3. **Syntax and Grammar Analysis**:
   - NLP involves parsing the structure of sentences to understand the relationships between words. This helps in correctly translating phrases and sentences while maintaining grammatical correctness in the target language.

4. **Context Understanding**:
   - Advanced NLP models, such as those used by Google Translate, understand the context of words and phrases. This allows them to choose the most appropriate translation for words that may have multiple meanings depending on the context.

5. **Translation Memory and Statistical Models**:
   - NLP systems often use a vast database of previous translations to make predictions. They may use statistical models or machine learning techniques to generate translations that are contextually and grammatically appropriate.

6. **Machine Learning and Neural Networks**:
   - Modern NLP heavily relies on deep learning models, particularly neural networks, which are trained on massive amounts of multilingual data. These models can understand and generate human-like text in different languages, enabling more accurate translations.

7. **Handling Ambiguity and Idiomatic Expressions**:
   - NLP techniques help in dealing with ambiguous terms, idiomatic expressions, and cultural nuances that can be tricky to translate directly. The system uses context and previous examples to provide more accurate translations.

### In Summary
The program  interfaces with a simple API call, the underlying system that powers Google Translate uses a sophisticated combination of NLP techniques. These techniques ensure that the translations are accurate, contextually appropriate, and grammatically correct. The heavy lifting in terms of NLP happens on the server side, and the program interacts with these advanced NLP processes through the API.

### Example - Corpus , Tokenize the corpus into words

In [1]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.util import ngrams
from nltk.probability import FreqDist

# Define the small corpus
corpus = [
    "The quick brown fox jumps over the lazy dog.",
    "She sells sea shells by the seashore.",
    "Peter Piper picked a peck of pickled peppers.",
    "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
    "Sally sells seashells by the seashore."
]
# Tokenize the corpus into words
tokenized_corpus = [sentence.split() for sentence in corpus]
for sentence_tokens in tokenized_corpus:
    print(sentence_tokens)
# Tokenize the corpus into words and convert to lowercase
unique_words = set()
for sentence in corpus:
    tokens = word_tokenize(sentence.lower())
    unique_words.update(tokens)

# Print the unique words
#print("Unique words in the corpus:")
#for word in unique_words:
#   print(word)
print(len(unique_words))


['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog.']
['She', 'sells', 'sea', 'shells', 'by', 'the', 'seashore.']
['Peter', 'Piper', 'picked', 'a', 'peck', 'of', 'pickled', 'peppers.']
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if', 'a', 'woodchuck', 'could', 'chuck', 'wood?']
['Sally', 'sells', 'seashells', 'by', 'the', 'seashore.']
34


### Example - The  n-grams

In [18]:
# Generate unigrams, bigrams, and trigrams for each sentence in the corpus
unigrams = [gram for sentence in tokenized_corpus for gram in ngrams(sentence, 1)]
bigrams = [gram for sentence in tokenized_corpus for gram in ngrams(sentence, 2)]
trigrams = [gram for sentence in tokenized_corpus for gram in ngrams(sentence, 3)]

# Print the generated n-grams
print("Unigrams:")
print(unigrams)
print(len(unigrams))

print("\nBigrams:")
print(bigrams)
print(len(bigrams))

print("\nTrigrams:")
print(trigrams)
print(len(trigrams))
# Build a bigram frequency distribution
bigram_freq_dist = FreqDist(bigrams)
print(bigram_freq_dist)

Unigrams:
[('The',), ('quick',), ('brown',), ('fox',), ('jumps',), ('over',), ('the',), ('lazy',), ('dog.',), ('She',), ('sells',), ('sea',), ('shells',), ('by',), ('the',), ('seashore.',), ('Peter',), ('Piper',), ('picked',), ('a',), ('peck',), ('of',), ('pickled',), ('peppers.',), ('How',), ('much',), ('wood',), ('would',), ('a',), ('woodchuck',), ('chuck',), ('if',), ('a',), ('woodchuck',), ('could',), ('chuck',), ('wood?',), ('Sally',), ('sells',), ('seashells',), ('by',), ('the',), ('seashore.',)]
43

Bigrams:
[('The', 'quick'), ('quick', 'brown'), ('brown', 'fox'), ('fox', 'jumps'), ('jumps', 'over'), ('over', 'the'), ('the', 'lazy'), ('lazy', 'dog.'), ('She', 'sells'), ('sells', 'sea'), ('sea', 'shells'), ('shells', 'by'), ('by', 'the'), ('the', 'seashore.'), ('Peter', 'Piper'), ('Piper', 'picked'), ('picked', 'a'), ('a', 'peck'), ('peck', 'of'), ('of', 'pickled'), ('pickled', 'peppers.'), ('How', 'much'), ('much', 'wood'), ('wood', 'would'), ('would', 'a'), ('a', 'woodchuck'), 

In [20]:
from nltk.corpus import stopwords
#nltk.download('stopwords')

stop_words = set(stopwords.words('english'))
filtered_tokens = [w for w in tokens if not w in stop_words]
print(filtered_tokens)


['sally', 'sells', 'seashells', 'seashore', '.']


In [21]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

stems = [stemmer.stem(word) for word in filtered_tokens]
print(stems)


['salli', 'sell', 'seashel', 'seashor', '.']


In [22]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(["This is an example.", "Another example."])
print(X.toarray())
print(vectorizer.get_feature_names_out())


[[1 0 1 1 1]
 [0 1 1 0 0]]
['an' 'another' 'example' 'is' 'this']
