<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn

Note: you may need to restart the kernel to use updated packages.


### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [3]:
import pandas as pd

df = pd.read_csv(URL)
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,Screen,GPU,OS,CPU_core,Screen_Size_cm,CPU_frequency,RAM_GB,Storage_GB_SSD,Weight_kg,Price
0,0,Acer,4,IPS Panel,2,1,5,35.56,1.6,8,256,1.6,978
1,1,Dell,3,Full HD,1,1,3,39.624,2.0,4,256,2.2,634
2,2,Dell,3,Full HD,1,1,7,39.624,2.7,8,256,2.2,946
3,3,Dell,4,IPS Panel,2,1,5,33.782,1.6,8,128,1.22,1244
4,4,HP,4,Full HD,2,1,7,39.624,1.8,8,256,1.91,837


In [4]:
columns_with_missing_values = df.columns[df.isnull().any()]
columns_with_missing_values

Index(['Screen_Size_cm', 'Weight_kg'], dtype='object')

In [5]:
# Replace missing values with the most frequent value
most_frequent_value = df['Screen_Size_cm'].mode()[0]
df['Screen_Size_cm'] = df['Screen_Size_cm'].fillna(most_frequent_value)

In [6]:
# Replace missing values with the mean value
mean_value = df['Weight_kg'].mean()
df['Weight_kg'] = df['Weight_kg'].fillna(mean_value)

In [7]:
# Change the data type of 'Screen_Size_cm' and 'Weight_kg' to float
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype(float)
df['Weight_kg'] = df['Weight_kg'].astype(float)

In [8]:
# Convert 'Screen_Size_cm' to inches and modify the attribute name
df['Screen_Size_inch'] = df['Screen_Size_cm'] * 0.393701
df.drop('Screen_Size_cm', axis=1, inplace=True)

# Convert 'Weight_kg' to pounds and modify the attribute name
df['Weight_lbs'] = df['Weight_kg'] * 2.20462
df.drop('Weight_kg', axis=1, inplace=True)


In [9]:
# Normalize the content under 'CPU_frequncy' with resspect to its maximum value
max_value = df['CPU_frequency'].max()
df['CPU_frequency'] = df['CPU_frequency'] / max_value

In [10]:
# Convert the 'Screen' attrbute into indicator variables
df1 = pd.get_dummies(df['Screen'], prefix='Screen')

# Append df1 into original data frame df
df = pd.concat([df, df1], axis=1)

# Drop the 'Screen' attribute from the data frame
df.drop('Screen', axis=1, inplace=True)

In [12]:
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Price,Screen_Size_inch,Weight_lbs,Screen_Full HD,Screen_IPS Panel
0,0,Acer,4,2,1,5,0.551724,8,256,978,14.000008,3.527392,False,True
1,1,Dell,3,1,1,3,0.689655,4,256,634,15.600008,4.850164,True,False
2,2,Dell,3,1,1,7,0.931034,8,256,946,15.600008,4.850164,True,False
3,3,Dell,4,2,1,5,0.551724,8,128,1244,13.300007,2.689636,False,True
4,4,HP,4,2,1,7,0.62069,8,256,837,15.600008,4.210824,True,False


---


In [13]:
# Convert the 'Price' attribute from USD to Euros
df['Price_Euros'] = df['Price'] * 0.92
df.drop('Price', axis=1, inplace=True)

# Write a Python code that can perform min-max normalization on the 'CPU_frequency' parameter
df['CPU_frequency_normalized'] = (df['CPU_frequency'] - df['CPU_frequency'].min()) / (df['CPU_frequency'].max() - df['CPU_frequency'].min())

In [14]:
df.head()

Unnamed: 0.1,Unnamed: 0,Manufacturer,Category,GPU,OS,CPU_core,CPU_frequency,RAM_GB,Storage_GB_SSD,Screen_Size_inch,Weight_lbs,Screen_Full HD,Screen_IPS Panel,Price_Euros,CPU_frequency_normalized
0,0,Acer,4,2,1,5,0.551724,8,256,14.000008,3.527392,False,True,899.76,0.235294
1,1,Dell,3,1,1,3,0.689655,4,256,15.600008,4.850164,True,False,583.28,0.470588
2,2,Dell,3,1,1,7,0.931034,8,256,15.600008,4.850164,True,False,870.32,0.882353
3,3,Dell,4,2,1,5,0.551724,8,128,13.300007,2.689636,False,True,1144.48,0.235294
4,4,HP,4,2,1,7,0.62069,8,256,15.600008,4.210824,True,False,770.04,0.352941


# Test Environment


**Building the prompt: Importing the data set**

In [None]:
# 1. Write a Python code that can perform the following tasks.
# 2. Read the CSV file, located on a given file path, into a Pandas data frame, 
#    assuming that the first rows of the file are the headers for the data.

import pandas as pd

# Specify the file path
file_path = "path/to/your/file.csv"

# Read the CSV file into a Pandas data frame
df = pd.read_csv(file_path)

# Assuming the first rows of the file are the headers, you don't need to specify any additional parameters

# Additional details:
# - The `pd.read_csv()` function is used to read a CSV file into a Pandas data frame.
# - By default, it assumes that the first row of the file contains the headers for the data.
# - If your file doesn't have headers, you can specify `header=None` as an additional parameter.
# - You can also specify other parameters, such as `sep` to specify the delimiter used in the file.
# - Make sure you have the Pandas library installed in your Python environment before running this code.

**Building the prompt: Handle the missing the data**

In [None]:
import pandas as pd

# Assuming you already have a Pandas data frame named 'df'

# Identify columns with missing values
columns_with_missing_values = df.columns[df.isnull().any()]

# Additional details:
# - The `df.isnull()` function returns a Boolean data frame where each cell is True if it contains a missing value (NaN), and False otherwise.
# - The `df.columns` attribute returns the column labels of the data frame.
# - The `.any()` method returns a Boolean Series indicating whether any value in the given axis (in this case, columns) is True.
# - Finally, the `.columns` attribute is used to retrieve the column labels where the condition is True.

# You can now use the 'columns_with_missing_values' variable to further analyze or handle the columns with missing values.

In [None]:
# Write a Python code to replace the missing values in a pandas data frame, per the following guidelines.
# 1. For a categorical attribute "Screen_Size_cm", replace the missing values with the most frequent value in the column.
# 2. For a continuous value attribute "Weight_kg", replace the missing values with the mean value of the entries in the column.

import pandas as pd

# Assuming you already have a Pandas data frame named 'df'

# Replace missing values in the 'Screen_Size_cm' column with the most frequent value
most_frequent_value = df['Screen_Size_cm'].mode()[0]
df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)

# Replace missing values in the 'Weight_kg' column with the mean value
mean_value = df['Weight_kg'].mean()
df['Weight_kg'].fillna(mean_value, inplace=True)

# Additional details:
# - The `.mode()` method is used to calculate the most frequent value in a column.
# - The `[0]` indexing is used to retrieve the most frequent value from the resulting Series.
# - The `.fillna()` method is used to replace missing values with a specified value.
# - The `inplace=True` parameter is used to modify the original data frame instead of creating a new one.

# You can now use the modified 'df' data frame, which has the missing values replaced according to the guidelines.

**Building the prompt: Modify data type**

In [None]:
# Write a Python code snippet to change the data type of the attributes "Screen_Size_cm" and "Weight_kg" of a data frame to float.

import pandas as pd

# Assuming you already have a Pandas data frame named 'df'

# Change the data type of 'Screen_Size_cm' and 'Weight_kg' to float
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype(float)
df['Weight_kg'] = df['Weight_kg'].astype(float)

# Additional details:
# - The `.astype()` method is used to change the data type of a column.
# - In this case, we're specifying `float` as the desired data type.
# - Make sure the columns contain numeric values that can be converted to float.
# - If there are any non-numeric values in the columns, the conversion will raise an error.

# You can now use the modified 'df' data frame, which has the data types of 'Screen_Size_cm' and 'Weight_kg' changed to float.

**Building the prompt: Standardization and Normalization**

In [None]:
#Write a Python code to modify the contents under the following attributes of the data frame as required.
#1. Data under 'Screen_Size_cm' is assumed to be in centimeters. Convert this data into inches. Modify the name of the attribute to 'Screen_Size_inch'.
#2. Data under 'Weight_kg' is assumed to be in kilograms. Convert this data into pounds. Modify the name of the attribute to 'Weight_pounds'.

import pandas as pd

# Assuming you already have a Pandas data frame named 'df'

# Convert 'Screen_Size_cm' from centimeters to inches and modify the attribute name
df['Screen_Size_inch'] = df['Screen_Size_cm'] * 0.393701
df.drop('Screen_Size_cm', axis=1, inplace=True)

# Convert 'Weight_kg' from kilograms to pounds and modify the attribute name
df['Weight_pounds'] = df['Weight_kg'] * 2.20462
df.drop('Weight_kg', axis=1, inplace=True)

# Additional details:
# - The code multiplies the values under 'Screen_Size_cm' by 0.393701 to convert centimeters to inches.
# - The resulting values are stored in a new attribute named 'Screen_Size_inch'.
# - The original 'Screen_Size_cm' attribute is dropped from the data frame using the `.drop()` method.
# - Similarly, the code multiplies the values under 'Weight_kg' by 2.20462 to convert kilograms to pounds.
# - The resulting values are stored in a new attribute named 'Weight_pounds'.
# - The original 'Weight_kg' attribute is dropped from the data frame.

# You can now use the modified 'df' data frame, which has the contents and attribute names modified as required.

In [None]:
# Write a Python code to normalize the content under the attribute "CPU_frequency" in a data frame df concerning its maximum value. 
# Make changes to the original data, and do not create a new attribute.

import pandas as pd

# Assuming you already have a Pandas data frame named 'df'

# Normalize the content under 'CPU_frequency' with respect to its maximum value
max_value = df['CPU_frequency'].max()
df['CPU_frequency'] = df['CPU_frequency'] / max_value

# Additional details:
# - The code calculates the maximum value of the 'CPU_frequency' attribute using the `.max()` method.
# - It then divides the values under 'CPU_frequency' by the maximum value to normalize them.
# - The resulting normalized values overwrite the original values in the 'CPU_frequency' attribute.

# You can now use the modified 'df' data frame, which has the content under the 'CPU_frequency' attribute normalized.

**Building the prompt: Categorical to numerical**

In [None]:
# Write a Python code to perform the following tasks.
# 1. Convert a data frame df attribute "Screen", into indicator variables, saved as df1, with the naming convention "Screen_<unique value of the attribute>".
# 2. Append df1 into the original data frame df.
# 3. Drop the original attribute from the data frame df.

import pandas as pd

# Assuming you already have a Pandas data frame named 'df'

# Convert the 'Screen' attribute into indicator variables
df1 = pd.get_dummies(df['Screen'], prefix='Screen')

# Append df1 into the original data frame df
df = pd.concat([df, df1], axis=1)

# Drop the original 'Screen' attribute from the data frame
df.drop('Screen', axis=1, inplace=True)

# Additional details:
# - The `pd.get_dummies()` function is used to convert a categorical attribute into indicator variables.
# - The resulting indicator variables are stored in a new data frame named 'df1'.
# - The `prefix` parameter is used to specify the naming convention for the indicator variables.
# - The `pd.concat()` function is used to concatenate the original data frame 'df' and 'df1' along the column axis (axis=1).
# - The resulting concatenated data frame is assigned back to 'df'.
# - Finally, the `.drop()` method is used to drop the original 'Screen' attribute from 'df'.

# You can now use the modified 'df' data frame, which has the 'Screen' attribute converted into indicator variables, appended, and the original attribute dropped.

**Practice Problemss**

In [None]:
#  Write a Python code that can perfomr the column price conversion from USD to Euros
import pandas as pd

# Define the file path
file_path = 'path/to/your/file.csv'  # Replace with the actual file path

# Exchange rate
usd_to_euro_rate = 0.92

try:
    # Read the CSV file into a pandas DataFrame
    df = pd.read_csv(file_path)
    print("CSV file successfully loaded into a DataFrame.")

    # Check if the 'Price' column exists
    if 'Price' in df.columns:
        # Convert Price to Euros and create a new column
        df['Price_in_Euros'] = pd.to_numeric(df['Price'], errors='coerce') * usd_to_euro_rate
        print("Price column successfully converted to Euros.")
    else:
        print("Error: The 'Price' column does not exist in the DataFrame.")

    # Preview the updated DataFrame
    print("Here is a preview of the data:")
    print(df.head())

    # Optional: Save the updated DataFrame to a new CSV file
    output_path = 'path/to/your/output_file.csv'  # Replace with the desired output path
    df.to_csv(output_path, index=False)
    print(f"Updated DataFrame saved to {output_path}.")
    
except FileNotFoundError:
    print(f"Error: The file at '{file_path}' was not found.")
except pd.errors.EmptyDataError:
    print("Error: The file is empty or does not contain valid data.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")


In [None]:
# Write a Python code that can perform min-max normalization on the 'CPU_frequency' parameter
import pandas as pd

# Sample DataFrame (Replace with your actual data)
data = {'CPU_frequency': [2.4, 3.2, 1.8, 2.9, 3.6]}
df = pd.DataFrame(data)

# Check if 'CPU_frequency' exists in the DataFrame
if 'CPU_frequency' in df.columns:
    # Perform min-max normalization
    min_value = df['CPU_frequency'].min()
    max_value = df['CPU_frequency'].max()
    
    df['CPU_frequency_normalized'] = (df['CPU_frequency'] - min_value) / (max_value - min_value)
    
    print("Normalization complete. Here is the updated DataFrame:")
    print(df)
else:
    print("Error: 'CPU_frequency' column not found in the DataFrame.")


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright Â© 2023 IBM Corporation. All rights reserved.
