# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [1]:
# imports
import os
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import display, Markdown, update_display


In [2]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [3]:
# set up environment
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

openai=OpenAI()

In [4]:
system_message = "You are a helpful technical assistant."
system_message+= "Provide a detailed explaination for the question asked by the user. for added context- it could be an explaination of code\
or any other technical topic. Respond in Markdown"
system_message+= "Provide accurate answers. if you don't know the answer to something, just say so."

In [5]:
# here is the question; type over this to ask something new
question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [6]:
# Get gpt-4o-mini to answer, with streaming

def tech_advisor(user_prompt):
    messages = [{'role':'system','content':system_message},
            {'role':'user','content':user_prompt}]
    
    stream = openai.chat.completions.create(model=MODEL_GPT,
                                             messages=messages,stream=True)
    response=""
    display_handle = display(Markdown(''), display_id=True )
    for chunk in stream:
        response += chunk.choices[0].delta.content or ""
        update_display(Markdown(response), display_id = display_handle.display_id)


In [7]:
user_prompt = "Please Explain what is the purpose of using NLTK library in Python"
tech_advisor(user_prompt)

# Purpose of Using NLTK Library in Python

**Natural Language Toolkit (NLTK)** is a powerful library in Python used for natural language processing (NLP). It provides users with easy-to-use interfaces to over 50 corpora and lexical resources, such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and more. Here are some key purposes and features of the NLTK library:

## 1. **Tokenization**

Tokenization is the process of splitting text into smaller components, typically sentences or words. NLTK provides a simple interface to tokenize text:

```python
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

text = "Hello, world. Welcome to natural language processing with NLTK."
sentences = sent_tokenize(text)  # Splits text into sentences
words = word_tokenize(text)       # Splits text into words
```

## 2. **Text Classification**

NLTK offers tools for text classification, allowing users to categorize text into predefined labels. This can help in tasks such as spam detection, sentiment analysis, and topic categorization.

```python
import nltk
from nltk.classify import NaiveBayesClassifier

# Example feature extraction and classifier training would go here.
```

## 3. **Stemming and Lemmatization**

Stemming reduces a word to its base or root form (e.g., "running" to "run"), while lemmatization considers the context and converts words to their base forms based on their part of speech. NLTK provides both capabilities.

```python
from nltk.stem import PorterStemmer, WordNetLemmatizer

stemmer = PorterStemmer()
print(stemmer.stem("running"))  # Outputs: run

lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("better", pos='a'))  # Outputs: good
```

## 4. **Part-of-Speech Tagging**

NLTK can tag words in a sentence with their respective part of speech (e.g., nouns, verbs, adjectives), which is crucial for understanding text structure and meaning.

```python
from nltk import pos_tag
from nltk.tokenize import word_tokenize

sentence = "NLTK is a powerful library for NLP."
tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)  # Outputs: [('NLTK', 'NNP'), ('is', 'VBZ'), ...]
```

## 5. **Parsing and Syntactic Analysis**

NLTK supports different parsing techniques and allows users to analyze sentence structures, which can be crucial for tasks such as grammar checking and understanding sentence relationships.

```python
from nltk import CoreNLPParser
parser = CoreNLPParser(url='http://localhost:9000')
parse_tree = next(parser.raw_parse("The cat sat on the mat."))
```

## 6. **Corpus Access**

The NLTK library provides easy access to a wide range of text corpora for training and testing NLP models. This includes famous datasets like the Brown corpus, Gutenberg corpus, etc.

```python
from nltk.corpus import gutenberg

# Accessing texts from the Gutenberg corpus
gutenberg.fileids()  # Lists available texts
```

## 7. **Building Intelligent Systems**

NLTK allows for the building of more advanced systems, such as chatbots, recommendation systems, or any application where processing human language is necessary.

## 8. **Visualization**

NLTK integrates with libraries like Matplotlib for visualizing data and results, making it easier to comprehend text analysis outcomes.

```python
import matplotlib.pyplot as plt
# Visualization code can be included here.
```

## Conclusion

The NLTK library is a comprehensive and versatile tool for anyone working with text data in Python. It simplifies many complex NLP tasks through its user-friendly interfaces and extensive functionalities. Whether you are doing basic tokenization or working on more complex NLP applications, NLTK provides the necessary tools to effectively process and analyze natural language data.