<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

In this lab, you will learn how to use generative AI for creating a Python code to:

- Handle missing values in the data set
- Correct the data type for the required data set attributes
- Perform standardization and normalization on required parameters
- Convert categorical data into numerical indicator variables

# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [30]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [31]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [32]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [33]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts
import pandas as pd

def read_csv_to_dataframe(file_path):
    # Try to read the CSV file into a DataFrame
    try:
        # Use pandas' read_csv function to read the file
        df = pd.read_csv(file_path)
        print("CSV file successfully read into DataFrame.")
        return df
    except FileNotFoundError:
        print(f"Error: The file at path {file_path} was not found.")
    except pd.errors.EmptyDataError:
        print(f"Error: The file at path {file_path} is empty or does not contain any data.")
    except pd.errors.ParserError:
        print(f"Error: There was an issue parsing the file at path {file_path}.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")


file_path = 'dataset.csv'
df = read_csv_to_dataframe(file_path)

CSV file successfully read into DataFrame.


In [34]:
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


In [35]:
import pandas as pd

def find_columns_with_missing_values(df):
    """
    Identify columns in the DataFrame that contain missing values (NaN).

    Parameters:
    df (pandas.DataFrame): The input DataFrame.

    Returns:
    list: Column names with missing values.
    """
    # Check for NaN values in each column
    cols_with_missing = []
    for column in df.columns:
        if df[column].isnull().any():
            cols_with_missing.append(column)
    
    return cols_with_missing

find_columns_with_missing_values(df)

['Screen_Size_cm', 'Weight_kg']

In [36]:
import pandas as pd

def fill_missing_values(df):
    # Check if 'Screen_Size_cm' exists in the DataFrame
    if 'Screen_Size_cm' in df.columns:
        # Fill missing values with the mode
        df['Screen_Size_cm'].fillna(df['Screen_Size_cm'].mode()[0], inplace=True)

    # Check if 'Weight_kg' exists in the DataFrame
    if 'Weight_kg' in df.columns:
        # Fill missing values with the mean
        df['Weight_kg'].fillna(df['Weight_kg'].mean(), inplace=True)
    
    return df

fill_missing_values(df)
df.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Screen_Size_cm'].fillna(df['Screen_Size_cm'].mode()[0], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Weight_kg'].fillna(df['Weight_kg'].mean(), inplace=True)


Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


In [37]:
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype(float)
df['Weight_kg'] = df['Weight_kg'].astype(float)

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


In [38]:
# Convert Screen_Size_cm from cm to inches
df['Screen_Size_inch'] = df['Screen_Size_cm'] * 0.393701
# Drop the old column since it's no longer needed
df = df.drop(columns=['Screen_Size_cm'])
    
# Convert Weight_kg from kg to pounds
df['Weight_pounds'] = df['Weight_kg'] * 2.20462
# Drop the old column since it's no longer needed
df = df.drop(columns=['Weight_kg'])

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,1.6,8,256,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,2.0,4,256,634,15.600008,4.850164
2,2,Dell,3,Full HD,1,1,7,2.7,8,256,946,15.600008,4.850164
3,3,Dell,4,IPS Panel,2,1,5,1.6,8,128,1244,13.300007,2.689636
4,4,HP,4,Full HD,2,1,7,1.8,8,256,837,15.600008,4.210824


In [39]:
#Normalize the 'CPU_frequency' column in the DataFrame based on its minimum and maximum value.
min_value = df['CPU_frequency'].min()
max_value = df['CPU_frequency'].max()

df['CPU_frequency'] = (df['CPU_frequency'] - min_value) / (max_value - min_value)

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,0.235294,8,256,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,0.470588,4,256,634,15.600008,4.850164
2,2,Dell,3,Full HD,1,1,7,0.882353,8,256,946,15.600008,4.850164
3,3,Dell,4,IPS Panel,2,1,5,0.235294,8,128,1244,13.300007,2.689636
4,4,HP,4,Full HD,2,1,7,0.352941,8,256,837,15.600008,4.210824


In [40]:
# Convert the 'Screen' attribute into indicator variables
df1 = pd.get_dummies(df['Screen'], prefix='Screen')
# Append df1 into the original data frame df
df = pd.concat([df, df1], axis=1)
# Drop the original 'Screen' attribute from the data frame
df.drop('Screen', axis=1, inplace=True)



In [41]:
exchange_rate = 0.91
df['Price'] = df['Price'] * exchange_rate

df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,0.235294,8,256,889.98,14.000008,3.527392,False,True
1,1,Dell,3,1,1,3,0.470588,4,256,576.94,15.600008,4.850164,True,False
2,2,Dell,3,1,1,7,0.882353,8,256,860.86,15.600008,4.850164,True,False
3,3,Dell,4,2,1,5,0.235294,8,128,1132.04,13.300007,2.689636,False,True
4,4,HP,4,2,1,7,0.352941,8,256,761.67,15.600008,4.210824,True,False


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
