<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [10]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts

import pandas as pd

def read_csv_to_dataframe(file_path):
    """
    Reads a CSV file located at the provided file path into a Pandas DataFrame.
    
    Args:
    file_path (str): The path to the CSV file.
    
    Returns:
    pandas.DataFrame: The data read from the CSV file.
    """
    try:
        # Reading the CSV file into DataFrame
        df = pd.read_csv(file_path)
        return df
    except FileNotFoundError:
        print(f"The file at {file_path} was not found.")
    except Exception as e:
        print(f"An error occurred: {e}")
    
    return None

# Example usage:
# Assuming `file_path` is already defined as the path to your CSV
file_path = "./dataset.csv"
df = read_csv_to_dataframe(file_path)

if df is not None:
    print(df.head())

   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel    2   1         5   
1           1         Dell         3    Full HD    1   1         3   
2           2         Dell         3    Full HD    1   1         7   
3           3         Dell         4  IPS Panel    2   1         5   
4           4           HP         4    Full HD    2   1         7   

   Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0          35.560            1.6       8             256       1.60    978  
1          39.624            2.0       4             256       2.20    634  
2          39.624            2.7       8             256       2.20    946  
3          33.782            1.6       8             128       1.22   1244  
4          39.624            1.8       8             256       1.91    837  


In [11]:
def identify_missing_values(df):
    """
    Identifies columns in a DataFrame that contain missing values.
    
    Args:
    df (pandas.DataFrame): The input DataFrame.
    
    Returns:
    List[str]: A list of column names with missing values.
    """
    # Check for NaN values in each column
    missing_columns = df.columns[df.isnull().any()].tolist()
    
    return missing_columns

missing_cols = identify_missing_values(df)
print("Columns with missing values:", missing_cols)

Columns with missing values: ['Screen_Size_cm', 'Weight_kg']


In [13]:
def fill_missing_values(df):
    """
    Replaces missing values in the DataFrame based on column type.
    
    For categorical columns (like "Screen_Size_cm"), missing values are replaced with the most frequent value.
    For continuous columns (like "Weight_kg"), missing values are replaced with the mean value.
    
    Args:
    df (pandas.DataFrame): The input DataFrame.
    
    Returns:
    pandas.DataFrame: The DataFrame with missing values filled.
    """
    # Identify columns and their data types
    categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
    numeric_cols = df.select_dtypes(include=['int64', 'float64']).columns.tolist()
    
    # Fill missing values based on column type
    for col in categorical_cols:
        if df[col].isnull().any():
            mode_value = df[col].mode()[0]
            df[col].fillna(mode_value, inplace=True)
    
    for col in numeric_cols:
        if df[col].isnull().any():
            mean_value = df[col].mean()
            df[col].fillna(mean_value, inplace=True)
    
    return df

In [14]:
fill_missing_values(df)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(mean_value, inplace=True)


Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.560,1.6,8,256,1.60,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.20,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.20,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837
...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,233,Lenovo,4,IPS Panel,2,1,7,35.560,2.6,8,256,1.70,1891
234,234,Toshiba,3,Full HD,2,1,5,33.782,2.4,8,256,1.20,1950
235,235,Lenovo,4,IPS Panel,2,1,5,30.480,2.6,8,256,1.36,2236
236,236,Lenovo,3,Full HD,3,1,5,39.624,2.5,6,256,2.40,883


In [15]:
def convert_columns_to_float(df, column1, column2):
    """
    Converts specified columns in a DataFrame to float data type.
    
    Args:
    df (pandas.DataFrame): The input DataFrame.
    column1 (str): The name of the first column to be converted to float.
    column2 (str): The name of the second column to be converted to float.
    
    Returns:
    pandas.DataFrame: The DataFrame with updated column data types.
    """
    # Check if columns exist in the DataFrame
    if column1 not in df.columns or column2 not in df.columns:
        raise ValueError(f"One or both of '{column1}' and '{column2}' are not columns in the DataFrame.")
    
    # Convert specified columns to float
    df[column1] = df[column1].astype(float)
    df[column2] = df[column2].astype(float)
    
    return df
    
updated_df = convert_columns_to_float(df, 'Screen_Size_cm', 'Weight_kg')
print(updated_df)

     Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0             0         Acer         4  IPS Panel    2   1         5   
1             1         Dell         3    Full HD    1   1         3   
2             2         Dell         3    Full HD    1   1         7   
3             3         Dell         4  IPS Panel    2   1         5   
4             4           HP         4    Full HD    2   1         7   
..          ...          ...       ...        ...  ...  ..       ...   
233         233       Lenovo         4  IPS Panel    2   1         7   
234         234      Toshiba         3    Full HD    2   1         5   
235         235       Lenovo         4  IPS Panel    2   1         5   
236         236       Lenovo         3    Full HD    3   1         5   
237         237      Toshiba         3    Full HD    2   1         5   

     Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0            35.560            1.6       8             2

In [16]:
def convert_units(df):
    """
    Converts 'Screen_Size_cm' to 'Screen_Size_inch' and 'Weight_kg' to 'Weight_pounds' in a DataFrame.
    
    Args:
    df (pandas.DataFrame): The input DataFrame.
    
    Returns:
    pandas.DataFrame: The DataFrame with converted units and renamed columns.
    """
    # Conversion factors
    CM_TO_INCH = 0.393701
    KG_TO_POUND = 2.20462
    
    # Perform unit conversions
    df['Screen_Size_inch'] = df['Screen_Size_cm'].apply(lambda x: x * CM_TO_INCH)
    df['Weight_pounds'] = df['Weight_kg'].apply(lambda x: x * KG_TO_POUND)
    
    # Remove the old columns
    df.drop(['Screen_Size_cm', 'Weight_kg'], axis=1, inplace=True)
    
    return df

In [17]:
convert_units(df)

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,1.6,8,256,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,2.0,4,256,634,15.600008,4.850164
2,2,Dell,3,Full HD,1,1,7,2.7,8,256,946,15.600008,4.850164
3,3,Dell,4,IPS Panel,2,1,5,1.6,8,128,1244,13.300007,2.689636
4,4,HP,4,Full HD,2,1,7,1.8,8,256,837,15.600008,4.210824
...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,233,Lenovo,4,IPS Panel,2,1,7,2.6,8,256,1891,14.000008,3.747854
234,234,Toshiba,3,Full HD,2,1,5,2.4,8,256,1950,13.300007,2.645544
235,235,Lenovo,4,IPS Panel,2,1,5,2.6,8,256,2236,12.000006,2.998283
236,236,Lenovo,3,Full HD,3,1,5,2.5,6,256,883,15.600008,5.291088


In [18]:
import numpy as np

def normalize_cpu_frequency(df):
    """
    Normalizes the 'CPU_frequency' column of a DataFrame.
    
    Normalizes the values in the 'CPU_frequency' column to a range between 0 and 1, using the maximum value in the column.
    
    Args:
    df (pandas.DataFrame): The input DataFrame containing 'CPU_frequency' column.
    
    Returns:
    pandas.DataFrame: The DataFrame with normalized 'CPU_frequency' column.
    """
    # Assuming all CPU frequencies are positive
    max_freq = df['CPU_frequency'].max()
    
    if max_freq == 0:
        raise ValueError("Maximum CPU frequency value cannot be zero for normalization.")
    
    # Normalize the 'CPU_frequency' column
    df['CPU_frequency'] = df['CPU_frequency'].apply(lambda x: (x - 0) / max_freq)
    
    return df

In [19]:
normalized_df = normalize_cpu_frequency(df)
print(normalized_df)

     Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0             0         Acer         4  IPS Panel    2   1         5   
1             1         Dell         3    Full HD    1   1         3   
2             2         Dell         3    Full HD    1   1         7   
3             3         Dell         4  IPS Panel    2   1         5   
4             4           HP         4    Full HD    2   1         7   
..          ...          ...       ...        ...  ...  ..       ...   
233         233       Lenovo         4  IPS Panel    2   1         7   
234         234      Toshiba         3    Full HD    2   1         5   
235         235       Lenovo         4  IPS Panel    2   1         5   
236         236       Lenovo         3    Full HD    3   1         5   
237         237      Toshiba         3    Full HD    2   1         5   

     CPU_frequency  RAM_GB  Storage_GB_SSD  Price  Screen_Size_inch  \
0         0.551724       8             256    978         14.000

In [25]:
def one_hot_encode(df, attribute):
    """
    Converts a categorical attribute in a DataFrame into dummy/indicator variables.
    
    Appends these variables to the original DataFrame and drops the original attribute column.
    
    Args:
    df (pandas.DataFrame): The input DataFrame.
    attribute (str): The name of the attribute column to be converted.
    
    Returns:
    pandas.DataFrame: The modified DataFrame with the encoded attribute and appended indicator variables.
    """
    # Check if the attribute exists in the DataFrame
    if attribute not in df.columns:
        raise ValueError(f"The attribute '{attribute}' does not exist in the DataFrame.")
    
    # Create dummy variables
    df1 = pd.get_dummies(df[attribute], prefix=attribute)
    
    # Append dummy variables to the original DataFrame
    df = pd.concat([df, df1], axis=1)
    
    # Drop the original attribute column
    df.drop(attribute, axis=1, inplace=True)
    
    return df

In [26]:
# Perform one-hot encoding on 'Screen' attribute
df = one_hot_encode(df, 'Screen')
print(df)

<class 'ValueError'>: The attribute 'Screen' does not exist in the DataFrame.

In [27]:
print(df)

     Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  CPU_frequency  \
0             0         Acer         4    2   1         5       0.551724   
1             1         Dell         3    1   1         3       0.689655   
2             2         Dell         3    1   1         7       0.931034   
3             3         Dell         4    2   1         5       0.551724   
4             4           HP         4    2   1         7       0.620690   
..          ...          ...       ...  ...  ..       ...            ...   
233         233       Lenovo         4    2   1         7       0.896552   
234         234      Toshiba         3    2   1         5       0.827586   
235         235       Lenovo         4    2   1         5       0.896552   
236         236       Lenovo         3    3   1         5       0.862069   
237         237      Toshiba         3    2   1         5       0.793103   

     RAM_GB  Storage_GB_SSD  Price  Screen_Size_inch  Weight_pounds  \
0         8     

In [28]:
def convert_usd_to_eur(df, price_column='Price'):
    """
    Converts prices from USD to EUR in the specified column of a DataFrame.
    
    The exchange rate used is hardcoded at 0.85 (1 USD = 0.85 EUR).
    
    Args:
    df (pandas.DataFrame): The input DataFrame containing the 'Price' column.
    price_column (str): The name of the column containing prices in USD.
    
    Returns:
    pandas.DataFrame: The DataFrame with prices converted to EUR.
    """
    # Hardcoded exchange rate
    exchange_rate = 0.85
    
    # Conversion
    df[price_column] = df[price_column].apply(lambda x: x * exchange_rate)
    
    return df

In [29]:
# Convert USD prices to EUR
df_eur = convert_usd_to_eur(df)

print(df_eur)

     Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  CPU_frequency  \
0             0         Acer         4    2   1         5       0.551724   
1             1         Dell         3    1   1         3       0.689655   
2             2         Dell         3    1   1         7       0.931034   
3             3         Dell         4    2   1         5       0.551724   
4             4           HP         4    2   1         7       0.620690   
..          ...          ...       ...  ...  ..       ...            ...   
233         233       Lenovo         4    2   1         7       0.896552   
234         234      Toshiba         3    2   1         5       0.827586   
235         235       Lenovo         4    2   1         5       0.896552   
236         236       Lenovo         3    3   1         5       0.862069   
237         237      Toshiba         3    2   1         5       0.793103   

     RAM_GB  Storage_GB_SSD    Price  Screen_Size_inch  Weight_pounds  \
0         8   

In [30]:
import numpy as np

def min_max_normalize(df, column_name='CPU_frequency'):
    """
    Performs min-max normalization on the specified column in a DataFrame.
    
    Args:
    df (pandas.DataFrame): The input DataFrame.
    column_name (str): The name of the column to normalize.
    
    Returns:
    pandas.DataFrame: The DataFrame with the normalized column.
    """
    if column_name not in df.columns:
        raise ValueError(f"The '{column_name}' column does not exist in the DataFrame.")
        
    # Calculate min and max values
    min_val = df[column_name].min()
    max_val = df[column_name].max()
    
    if min_val == max_val:
        raise ValueError(f"All values in '{column_name}' column are identical; cannot perform normalization.")
    
    # Apply min-max normalization
    df[column_name] = (df[column_name] - min_val) / (max_val - min_val)
    
    return df

In [31]:
# Normalizing CPU_frequency column
normalized_df = min_max_normalize(df)
print(normalized_df)

     Unnamed: 0 Manufacturer  Category  GPU  OS  CPU_core  CPU_frequency  \
0             0         Acer         4    2   1         5       0.235294   
1             1         Dell         3    1   1         3       0.470588   
2             2         Dell         3    1   1         7       0.882353   
3             3         Dell         4    2   1         5       0.235294   
4             4           HP         4    2   1         7       0.352941   
..          ...          ...       ...  ...  ..       ...            ...   
233         233       Lenovo         4    2   1         7       0.823529   
234         234      Toshiba         3    2   1         5       0.705882   
235         235       Lenovo         4    2   1         5       0.823529   
236         236       Lenovo         3    3   1         5       0.764706   
237         237      Toshiba         3    2   1         5       0.647059   

     RAM_GB  Storage_GB_SSD    Price  Screen_Size_inch  Weight_pounds  \
0         8   

### My comments:
----
#### Practicing with Gen AI:
In this script, we are practicing how generative AI prompts work such that it can generate code according to your needs.
In this example we try to make the data cleanning/wrangling and preprocesssing parts quicker and more efficient for a given dataset.

The AI generated code is written based on specific prompts that were given to it by me, based on the instructions specified in the lab:

**Examples of The prompts that were given to the AI (IBM Granite 3.2 8B(Reasoning)):**

**prompt 1:**

```
Write a Python code that can perform the following tasks.
Read the CSV file, located on a given file path, into a Pandas data frame, assuming that the first rows of the file are the headers for the data.
```

**prompt 2:** 
```
Write a Python code that identifies the columns with missing values in a pandas data frame.
```

**prompt 3:**
```
Write a Python code to replace the missing values in a pandas data frame, per the following guidelines.
1. For a categorical attribute "Screen_Size_cm", replace the missing values with the most frequent value in the column.
2. For a continuous value attribute "Weight_kg", replace the missing values with the mean value of the entries in the column.
```
**prompt 4:**
```
Write a Python code snippet to change the data type of the attributes "Screen_Size_cm" and "Weight_kg" of a data frame to float.
```

**prompt 5:**
```
write a python code that performs min-max normalization on a dataframe that contains a column called CPU_frequency"
```

**prompt 6:**
```
Write a Python code to modify the contents under the following attributes of the data frame as required.
1. Data under 'Screen_Size_cm' is assumed to be in centimeters. Convert this data into inches. Modify the name of the attribute to 'Screen_Size_inch'.
2. Data under 'Weight_kg' is assumed to be in kilograms. Convert this data into pounds. Modify the name of the attribute to 'Weight_pounds'.
```

**prompt 7:**
```
Write a Python code to normalize the content under the attribute "CPU_frequency" in a data frame df concerning its maximum value. Make changes to the original data, and do not create a new attribute.
```

**prompt 8:**
```
Write a Python code to perform the following tasks.
1. Convert a data frame df attribute "Screen", into indicator variables, saved as df1, with the naming convention "Screen_<unique value of the attribute>".
2. Append df1 into the original data frame df.
3. Drop the original attribute from the data frame df.
```

**prompt 9:**
```
write a python code that convert the dataframe column named "Price" which has prices in US Dollars to Euros
```

**prompt 10:**
```
write a python code that performs min-max normalization on a dataframe that contains a column called CPU_frequency
```
#### Overall, It does the job well enough for the example or the tasks required for this lab.

----

## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
