<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


Load Dataset

### Write a Python code that can perform the following tasks.
Read the CSV file, located on a given file path, into a Pandas data frame, assuming that the first rows of the file are the headers for the data.

### Building prompt to handle missing data

Write a Python code that identifies the columns with missing values in a pandas data frame.


### Write a Python code to replace the missing values in a pandas data frame, per the following guidelines.

1. For a categorical attribute "Screen_Size_cm", replace the missing values with the most frequent value in the column.
2. For a continuous value attribute "Weight_kg", replace the missing values with the mean value of the entries in the column.

### Building prompt: modify data type
Write a Python code snippet to change the data type of the attributes "Screen_Size_cm" and "Weight_kg" of a data frame to float.

### Building prompt: Standardization and normalization
Write a Python code to modify the contents under the following attributes of the data frame as required.
1. Data under 'Screen_Size_cm' is assumed to be in centimeters. Convert this data into inches. Modify the name of the attribute to 'Screen_Size_inch'.
2. Data under 'Weight_kg' is assumed to be in kilograms. Convert this data into pounds. Modify the name of the attribute to 'Weight_pounds'.


### Write a Python code to normalize the content under the attribute "CPU_frequency" in a data frame df concerning its maximum value. Make changes to the original data, and do not create a new attribute.

### Building prompt: Categorical to numerical value
Write a Python code to perform the following tasks.
1. Convert a data frame df attribute "Screen", into indicator variables, saved as df1, with the naming convention "Screen_<unique value of the attribute>".
2. Append df1 into the original data frame df.
3. Drop the original attribute from the data frame df.

### Exercise
Write a Python code to normalize the content under the attribute "CPU_frequency" in a data frame df concerning its min-max normalization. Make changes to the original data, and do not create a new attribute.





In [25]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [26]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [27]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [28]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts
import pandas as pd

# Specify the file path of the CSV file
file_path = "dataset.csv"

# Read the CSV file into a Pandas data frame
df = pd.read_csv(file_path)

# Display the first few rows of the data frame
print(df.head())

   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel    2   1         5   
1           1         Dell         3    Full HD    1   1         3   
2           2         Dell         3    Full HD    1   1         7   
3           3         Dell         4  IPS Panel    2   1         5   
4           4           HP         4    Full HD    2   1         7   

   Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0          35.560            1.6       8             256       1.60    978  
1          39.624            2.0       4             256       2.20    634  
2          39.624            2.7       8             256       2.20    946  
3          33.782            1.6       8             128       1.22   1244  
4          39.624            1.8       8             256       1.91    837  


In [29]:
# Identify columns with missing values
columns_with_missing_values = df.columns[df.isnull().any()].tolist()

# Display the columns with missing values
print("Columns with missing values:", columns_with_missing_values)

Columns with missing values: ['Screen_Size_cm', 'Weight_kg']


In [30]:
df['Screen_Size_cm'].dtypes

dtype('float64')

In [31]:
df['Weight_kg'].dtypes

dtype('float64')

In [32]:
df.dtypes

Unnamed: 0          int64
Manufacturer       object
Category            int64
Screen             object
GPU                 int64
OS                  int64
CPU_core            int64
Screen_Size_cm    float64
CPU_frequency     float64
RAM_GB              int64
Storage_GB_SSD      int64
Weight_kg         float64
Price               int64
dtype: object

In [33]:
# Replace missing values in the 'Weight_kg' column with the mean value
mean_weight = df['Weight_kg'].mean()
df['Weight_kg'].fillna(mean_weight, inplace=True)

mean_screen_size = df['Screen_Size_cm'].mean()
df['Screen_Size_cm'].fillna(mean_screen_size, inplace=True)

In [34]:
# Identify columns with missing values
columns_with_missing_values = df.columns[df.isnull().any()].tolist()

# Display the columns with missing values
print("Columns with missing values:", columns_with_missing_values)

Columns with missing values: []


In [35]:
# Convert 'Screen_Size_cm' data from centimeters to inches
df['Screen_Size_inch'] = df['Screen_Size_cm'] * 0.393701

# Convert 'Weight_kg' data from kilograms to pounds
df['Weight_pounds'] = df['Weight_kg'] * 2.20462

# Drop the original 'Screen_Size_cm' and 'Weight_kg' columns
df.drop(['Screen_Size_cm', 'Weight_kg'], axis=1, inplace=True)

# Display the modified data frame
df

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,1.6,8,256,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,2.0,4,256,634,15.600008,4.850164
2,2,Dell,3,Full HD,1,1,7,2.7,8,256,946,15.600008,4.850164
3,3,Dell,4,IPS Panel,2,1,5,1.6,8,128,1244,13.300007,2.689636
4,4,HP,4,Full HD,2,1,7,1.8,8,256,837,15.600008,4.210824
...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,233,Lenovo,4,IPS Panel,2,1,7,2.6,8,256,1891,14.000008,3.747854
234,234,Toshiba,3,Full HD,2,1,5,2.4,8,256,1950,13.300007,2.645544
235,235,Lenovo,4,IPS Panel,2,1,5,2.6,8,256,2236,12.000006,2.998283
236,236,Lenovo,3,Full HD,3,1,5,2.5,6,256,883,15.600008,5.291088


In [36]:
df['CPU_frequency']/df['CPU_frequency'].max()

0      0.551724
1      0.689655
2      0.931034
3      0.551724
4      0.620690
         ...   
233    0.896552
234    0.827586
235    0.896552
236    0.862069
237    0.793103
Name: CPU_frequency, Length: 238, dtype: float64

In [37]:
# Normalize the 'CPU_frequency' attribute by dividing each value by the maximum value
max_cpu_frequency = df['CPU_frequency'].max()
df['CPU_frequency'] = df['CPU_frequency'] / max_cpu_frequency

In [38]:
df

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds
0,0,Acer,4,IPS Panel,2,1,5,0.551724,8,256,978,14.000008,3.527392
1,1,Dell,3,Full HD,1,1,3,0.689655,4,256,634,15.600008,4.850164
2,2,Dell,3,Full HD,1,1,7,0.931034,8,256,946,15.600008,4.850164
3,3,Dell,4,IPS Panel,2,1,5,0.551724,8,128,1244,13.300007,2.689636
4,4,HP,4,Full HD,2,1,7,0.620690,8,256,837,15.600008,4.210824
...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,233,Lenovo,4,IPS Panel,2,1,7,0.896552,8,256,1891,14.000008,3.747854
234,234,Toshiba,3,Full HD,2,1,5,0.827586,8,256,1950,13.300007,2.645544
235,235,Lenovo,4,IPS Panel,2,1,5,0.896552,8,256,2236,12.000006,2.998283
236,236,Lenovo,3,Full HD,3,1,5,0.862069,6,256,883,15.600008,5.291088


In [39]:
# Convert 'Screen' attribute into indicator variables
df1 = pd.get_dummies(df['Screen'], prefix='Screen')

# Append df1 into the original data frame df
df = pd.concat([df, df1], axis=1)

# Drop the original 'Screen' attribute from the data frame
df.drop('Screen', axis=1, inplace=True)

In [40]:
df

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,0.551724,8,256,978,14.000008,3.527392,0,1
1,1,Dell,3,1,1,3,0.689655,4,256,634,15.600008,4.850164,1,0
2,2,Dell,3,1,1,7,0.931034,8,256,946,15.600008,4.850164,1,0
3,3,Dell,4,2,1,5,0.551724,8,128,1244,13.300007,2.689636,0,1
4,4,HP,4,2,1,7,0.620690,8,256,837,15.600008,4.210824,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,233,Lenovo,4,2,1,7,0.896552,8,256,1891,14.000008,3.747854,0,1
234,234,Toshiba,3,2,1,5,0.827586,8,256,1950,13.300007,2.645544,1,0
235,235,Lenovo,4,2,1,5,0.896552,8,256,2236,12.000006,2.998283,0,1
236,236,Lenovo,3,3,1,5,0.862069,6,256,883,15.600008,5.291088,1,0


In [41]:
# Assuming the DataFrame is named 'df'
conversion_rate = 0.85 # 1 USD = 0.85 Euros

df['Price'] = df['Price'] * conversion_rate

In [42]:
df

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_pounds,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,0.551724,8,256,831.30,14.000008,3.527392,0,1
1,1,Dell,3,1,1,3,0.689655,4,256,538.90,15.600008,4.850164,1,0
2,2,Dell,3,1,1,7,0.931034,8,256,804.10,15.600008,4.850164,1,0
3,3,Dell,4,2,1,5,0.551724,8,128,1057.40,13.300007,2.689636,0,1
4,4,HP,4,2,1,7,0.620690,8,256,711.45,15.600008,4.210824,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,233,Lenovo,4,2,1,7,0.896552,8,256,1607.35,14.000008,3.747854,0,1
234,234,Toshiba,3,2,1,5,0.827586,8,256,1657.50,13.300007,2.645544,1,0
235,235,Lenovo,4,2,1,5,0.896552,8,256,1900.60,12.000006,2.998283,0,1
236,236,Lenovo,3,3,1,5,0.862069,6,256,750.55,15.600008,5.291088,1,0


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
