<a href="https://colab.research.google.com/github/rezazamani2329/AIML-UC-Berkeley-Generative-AI/blob/main/codio_assignment23_1_Translating_Data_into_Text.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Required Codio Assignment 23.1: Translating Data into Text

**Expected Time = 60 minutes**

**Total Points = 20**

In this assignment, you will generate a narrative from a dataframe without explicitly crafting a prompt. You will utilize the `pandas` library to convert the dataframe to a format that the `OpenAI` can interpret. Next you will use the OpenAI API to train you model to generate a narrative from the data.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)



## The Dataset

For this assignment, you will be using a dataset that contains list of video games with global sales greater than 100,000 copies. The fields in this dataset include:

- Rank: The ranking of overall sales
- Name: The name of the video game
- Platform: The platform of the games release (i.e. PC,PS4, etc.)
- Year: The year of the game's release
- Genre: The genre of the game
- Publisher: The publisher of the game
- NA_Sales: Sales in North America (in millions)
- EU_Sales: Sales in Europe (in millions)
- JP_Sales: Sales in Japan (in millions)
- Other_Sales: Sales in the rest of the world (in millions)
- Global_Sales: Total worldwide sales.


Run the code cell below to import the necessary libraries and the dataset.

In [9]:
from groq import Groq
import pandas as pd
import openai
from openai import OpenAI

# Sample dataset to feed the program
data = pd.read_csv('vgsales.csv')

In [None]:
data.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


[Back to top](#-Index)

### Problem 1

#### Total Sale Calculation

**5 Points**


In the code cell below, use the `groupby` function on the `data` dataframe to calculate the total `Global_Sales` for each `Genre` of video games and assign the result to the dataframe `total_sales` below. Ensure to use the function `reset_index` with default arguments to reset the index of the dataframe.

In [10]:
### GRADED
total_sales = ...


### BEGIN SOLUTION
total_sales = data.groupby('Genre')['Global_Sales'].sum().reset_index()
### END SOLUTION

### ANSWER CHECK
total_sales

Unnamed: 0,Genre,Global_Sales
0,Action,1751.18
1,Adventure,238.92
2,Fighting,448.91
3,Idea Factory,0.0
4,Misc,809.96
5,Platform,831.37
6,Puzzle,244.95
7,Racing,732.04
8,Role-Playing,927.37
9,Shooter,1037.37


[Back to top](#-Index)

### Problem 2

#### Data Preparation

**5 Points**

Next, you will convert the `total_sales` dataframe to a string format, which can be passed directly to the OpenAI library.

In the code cell below, use the `to_string` function on the `total_sales` dataframe with argument `index` equal to `False`. Additionally, to ensure that your data is formatted correctly, use the `replace` function with argument `'\n', ' '` to delete any newline characters.

Next, define a dictionary `data_dict` with key equal to `data` and field equal to `data_string`.

In [11]:
### GRADED
data_string = ...
data_dict = ...


### BEGIN SOLUTION

data_string = total_sales.to_string(index=False).replace('\n', ' ')
data_dict = {"data": data_string}
### END SOLUTION

### ANSWER CHECK
data_dict

{'data': '                      Genre  Global_Sales                      Action       1751.18                   Adventure        238.92                    Fighting        448.91                Idea Factory          0.00                        Misc        809.96                    Platform        831.37                      Puzzle        244.95                      Racing        732.04                Role-Playing        927.37                     Shooter       1037.37                  Simulation        392.20 Sony Computer Entertainment          0.00                      Sports       1330.93                    Strategy        175.12'}

[Back to top](#-Index)

### Problem 3

#### Obtaining Your OpenAI API Key

**10 Points**

To use the OpenAI library, you are required to obtain an API key. You will obtain a free API key from Groq. To access Groq and get your personal API key, follow the steps below:

1. **Create an OpenAI Account:**
Visit the [Groq website](https://console.groq.com/login) and sign up if you don't already have an account.
2. **Log In:**
Log in using your credentials at the Groq platform.
3. **Navigate to the API Dashboard:**
Once logged in, navigate to the [Groq playground](https://console.groq.com/playground).
4. **Generate an API Key:**
In the menu on the left, click on API Keys, and click on the Create API key button. Copy your API key and store it in a secure place.


You can now use this API key in your applications to authenticate requests to OpenAI's API.

Once you have generated your API key, in the code cell below, invoke the `Groq` function and assign to the `api_key` argument your API key as a string followed by a comma. Assign the function call to the `client` variable below.

In [12]:
### GRADED
import os
client = OpenAI(api_key="...",)


### BEGIN SOLUTION

client = Groq(
    api_key=("your_key"),
)
### END SOLUTION

### ANSWER CHECK
client

<groq.Groq at 0x7f707e5cc390>

[Back to top](#-Index)

### Problem 4

#### Generating the Text Narrative from Your Data

**This question is not graded**

In the code cell below you will generate a narrative based on the data in the dataframe using OpenAI.

Complete the code below using the instructions below:

- Use the `chat.completions.create()` method on the `client` object to send a chat request and receive a response.
- Inside the `chat.completions.create()` method, set the parameter `messages` to defines the conversation history or prompt that will be sent to the model equal to `{"role": "user", "content":  f"Analyze the following data: {data_object['data']}"}`.
- Inside the `chat.completions.create()` method, set the parameter `model` equal to `"llama3-8b-8192"`.

Assign this object to the `chat_completion` variable below.


Once you are done, run the code cell to visualize the narrative.

Lastly, ensure that your narrative is formatted correctly.

In [13]:
### GRADED
narrative = ...


### BEGIN SOLUTION
chat_completion = client.chat.completions.create(
    messages=[
        {"role": "user", "content":  f"Analyze the following data: {data_dict['data']}"}
    ],
    model="llama3-8b-8192",
)

narrative = chat_completion.choices[0].message.content

narrative.replace('\n', ' ')
### END SOLUTION

"Based on the provided data, here's an analysis of the global sales by genre and parent company/developer:  **Top 3 Genres:**  1. Action: 1751.18 million 2. Role-Playing: 927.37 million 3. Platform: 831.37 million  These three genres account for over 45% of the total global sales.  **Other Notable Genres:**  * Fighting: 448.91 million * Racing: 732.04 million * Shooter: 1037.37 million * Strategy: 175.12 million  These genres contribute significantly to the overall global sales, with the shooter genre being particularly popular.  **Parent Company/Developer:**  * Idea Factory: 0.00 million (no significant presence in the global sales)  This may indicate that Idea Factory has not developed or published any games that have achieved significant commercial success in the period covered by the data.  **Notable Absence:**  * Sony Computer Entertainment: 0.00 million (no significant presence in the global sales)  This is surprising since Sony is a well-established and prominent player in the g