## Experiment 8
## Identify parts-of Speech using Penn Treebank tag set.

**Penn Treebank and POS Tagging Overview:**

- *Penn Treebank:* A widely used corpus of English text annotated with syntactic and grammatical information.
- *POS Tagging:* Part-of-Speech tagging assigns grammatical categories (e.g., noun, verb) to words in a text.
- *Importance:* Essential for natural language processing tasks, aiding in language understanding and analysis.
- *Applications:* Machine translation, information retrieval, and sentiment analysis benefit from accurate POS tagging.
- *Challenges:* Ambiguity and context-dependent word meanings pose challenges in accurate tagging.


**Part-of-Speech (POS) Tags and Detailed Description:**

- **CC:** Coordinating conjunction
  - *Example:* and, but, or

- **CD:** Cardinal number
  - *Example:* 1, 2, three

- **DT:** Determiner
  - *Example:* the, a, an

- **EX:** Existential there
  - *Example:* There is, there are

- **FW:** Foreign word
  - *Example:* bon appétit, pièce de résistance

- **IN:** Preposition or subordinating conjunction
  - *Example:* in, on, under, while

- **JJ:** Adjective
  - *Example:* happy, tall, blue

- **JJR:** Adjective, comparative
  - *Example:* happier, taller, bluer

- **JJS:** Adjective, superlative
  - *Example:* happiest, tallest, bluest

- **LS:** List item marker
  - *Example:* 1., 2., (a), (b)

- **MD:** Modal
  - *Example:* can, could, will

- **NN:** Noun, singular or mass
  - *Example:* cat, love, happiness

- **NNS:** Noun, plural
  - *Example:* cats, loves, happinesses

- **NNP:** Proper noun, singular
  - *Example:* John, London, December

- **NNPS:** Proper noun, plural
  - *Example:* Johns, Londons, Decembers

- **PDT:** Predeterminer
  - *Example:* both, half, all

- **POS:** Possessive ending
  - *Example:* 's, '

- **PRP:** Personal pronoun
  - *Example:* I, you, he, she

- **PRP\$:** Possessive pronoun
  - *Example:* my, your, his, her

- **RB:** Adverb
  - *Example:* quickly, softly, very

- **RBR:** Adverb, comparative
  - *Example:* faster, earlier, more

- **RBS:** Adverb, superlative
  - *Example:* fastest, earliest, most

- **RP:** Particle
  - *Example:* up, down, off

- **SYM:** Symbol
  - *Example:* $, %, +

- **TO:** to
  - *Example:* to go, to dance, to eat

- **UH:** Interjection
  - *Example:* oh, wow, oops

- **VB:** Verb, base form
  - *Example:* run, jump, eat

- **VBD:** Verb, past tense
  - *Example:* ran, jumped, ate

- **VBG:** Verb, gerund or present participle
  - *Example:* running, jumping, eating

- **VBN:** Verb, past participle
  - *Example:* run, jumped, eaten

- **VBP:** Verb, non-3rd person singular present
  - *Example:* am, are, have

- **VBZ:** Verb, 3rd person singular present
  - *Example:* is, has, does

- **WDT:** Wh-determiner
  - *Example:* which, whose, whatever

- **WP:** Wh-pronoun
  - *Example:* who, what, whom

- **WP\$:** Possessive wh-pronoun
  - *Example:* whose, whosever

- **WRB:** Wh-adverb
  - *Example:* when, where, why


These are common POS tags used in the Penn Treebank POS tagging scheme, along with detailed descriptions and examples of words falling under each category. Each tag represents a specific grammatical category or syntactic function of a word in a sentence.

## Explanation of Code

1. **Import Libraries:**
   - `import nltk`: Import the Natural Language Toolkit (NLTK).
   - `from nltk.tokenize import word_tokenize`: Import the word_tokenize function for tokenization.

2. **Download NLTK Data:**
   - `nltk.download('punkt')`: Download NLTK data for tokenization.
   - `nltk.download('averaged_perceptron_tagger')`: Download NLTK data for POS tagging.

3. **Function Definition:**
   - `identify_parts_of_speech(text)`: Define a function that takes a text input, tokenizes it into words, and tags the parts of speech.

4. **Tokenization:**
   - `words = word_tokenize(text)`: Tokenize the input text into words.

5. **POS Tagging:**
   - `pos_tags = nltk.pos_tag(words)`: Use NLTK's `pos_tag` function to tag parts of speech for each word.

6. **Return Result:**
   - `return pos_tags`: Return the list of tuples containing words and their corresponding POS tags.

7. **Example Usage:**
   - `text = "This is a sample sentence."`: Define a sample text.
   - `parts_of_speech = identify_parts_of_speech(text)`: Call the function to identify parts of speech.
   - `print(parts_of_speech)`: Print the identified parts of speech for each word in the sample sentence.

In [None]:
import nltk
from nltk.tokenize import word_tokenize

# Download the NLTK data (you only need to do this once)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

def identify_parts_of_speech(text):
    # Tokenize the text into words
    words = word_tokenize(text)

    # Use the pos_tag function to tag parts of speech
    pos_tags = nltk.pos_tag(words)

    return pos_tags

# Example usage
text = "This is a sample sentence."
parts_of_speech = identify_parts_of_speech(text)
print(parts_of_speech)


[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('sample', 'JJ'), ('sentence', 'NN'), ('.', '.')]


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
