<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="Skills Network Logo">
    </a>
</p>


# Test Environment for Generative AI classroom labs

This lab provides a test environment for the codes generated using the Generative AI classroom.

Follow the instructions below to set up this environment for further use.


# Setup


### Install required libraries

In case of a requirement of installing certain python libraries for use in your task, you may do so as shown below.


In [1]:
%pip install seaborn
import piplite

await piplite.install(['nbformat', 'plotly'])

### Dataset URL from the GenAI lab
Use the URL provided in the GenAI lab in the cell below. 


In [2]:
URL = URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-Coursera/laptop_pricing_dataset_mod1.csv"

### Downloading the dataset

Execute the following code to download the dataset in to the interface.

> Please note that this step is essential in JupyterLite. If you are using a downloaded version of this notebook and running it on JupyterLabs, then you can skip this step and directly use the URL in pandas.read_csv() function to read the dataset as a dataframe


In [10]:
from pyodide.http import pyfetch

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())

file_path = URL

await download(path, "dataset.csv")

---


# Test Environment


In [12]:
# Keep appending the code generated to this cell, or add more cells below this to execute in parts
import pandas as pd

def read_csv_file(file_path):
    """
    Reads a CSV file into a Pandas DataFrame.

    Args:
        file_path (str): The path to the CSV file.

    Returns:
        pd.DataFrame: The contents of the CSV file as a Pandas DataFrame.
    """
    try:
        # Attempt to read the CSV file
        df = pd.read_csv(file_path)
        return df
    except FileNotFoundError:
        print(f"Error: The file '{file_path}' was not found.")
        return None
    except pd.errors.EmptyDataError:
        print(f"Error: The file '{file_path}' is empty.")
        return None
    except pd.errors.ParserError as e:
        print(f"Error: An error occurred while parsing the file '{file_path}': {e}")
        return None

# Example usage:
file_path = 'dataset.csv'  # Replace with your file path
df = read_csv_file(file_path)
if df is not None:
    print(df.head())  # Print the first few rows of the DataFrame

   Unnamed: 0 Manufacturer  Category     Screen  GPU  OS  CPU_core  \
0           0         Acer         4  IPS Panel    2   1         5   
1           1         Dell         3    Full HD    1   1         3   
2           2         Dell         3    Full HD    1   1         7   
3           3         Dell         4  IPS Panel    2   1         5   
4           4           HP         4    Full HD    2   1         7   

   Screen_Size_cm  CPU_frequency  RAM_GB  Storage_GB_SSD  Weight_kg  Price  
0          35.560            1.6       8             256       1.60    978  
1          39.624            2.0       4             256       2.20    634  
2          39.624            2.7       8             256       2.20    946  
3          33.782            1.6       8             128       1.22   1244  
4          39.624            1.8       8             256       1.91    837  


In [13]:
# Create a sample DataFrame with missing values
data = {
    'Name': ['John', 'Mary', 'David', 'Jane', 'Bob', None, None, None],
    'Age': [25, 31, 42, 28, 35, None, None, None],
    'City': ['New York', 'Chicago', 'Los Angeles', 'Houston', 'Seattle', None, None, None]
}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Identify columns with missing values
missing_columns = df.isnull().sum()

# Print the columns with missing values
print("\nColumns with missing values:")
print(missing_columns)

# Identify columns with missing values by column name
missing_columns_by_name = df.columns[missing_columns > 0]

# Print the columns with missing values by column name
print("\nColumns with missing values by column name:")
print(missing_columns_by_name)

Original DataFrame:
    Name   Age         City
0   John  25.0     New York
1   Mary  31.0      Chicago
2  David  42.0  Los Angeles
3   Jane  28.0      Houston
4    Bob  35.0      Seattle
5   None   NaN         None
6   None   NaN         None
7   None   NaN         None

Columns with missing values:
Name    3
Age     3
City    3
dtype: int64

Columns with missing values by column name:
Index(['Name', 'Age', 'City'], dtype='object')


In [15]:
import numpy as np

# Create a sample DataFrame with missing values
data = {
    'Screen_Size_cm': [100, 120, 150, 180, 200, None, 220, 250],
    'Weight_kg': [50, 60, 70, 80, 90, 100, 110, 120],
    'Age': [25, 30, 35, 40, 45, 50, 55, 60]
}
df = pd.DataFrame(data)

# Replace missing values in 'Screen_Size_cm' with the most frequent value
df['Screen_Size_cm'] = df['Screen_Size_cm'].fillna(df['Screen_Size_cm'].mode().iloc[0])

# Replace missing values in 'Weight_kg' with the mean value
df['Weight_kg'] = df['Weight_kg'].fillna(df['Weight_kg'].mean())

# Print the DataFrame after replacement
print("DataFrame after replacing missing values in 'Screen_Size_cm':")
print(df)

print("\nDataFrame after replacing missing values in 'Weight_kg':")
print(df)

DataFrame after replacing missing values in 'Screen_Size_cm':
   Screen_Size_cm  Weight_kg  Age
0           100.0         50   25
1           120.0         60   30
2           150.0         70   35
3           180.0         80   40
4           200.0         90   45
5           100.0        100   50
6           220.0        110   55
7           250.0        120   60

DataFrame after replacing missing values in 'Weight_kg':
   Screen_Size_cm  Weight_kg  Age
0           100.0         50   25
1           120.0         60   30
2           150.0         70   35
3           180.0         80   40
4           200.0         90   45
5           100.0        100   50
6           220.0        110   55
7           250.0        120   60


In [16]:
# Create a sample DataFrame with missing values
data = {
    'Screen_Size_cm': [100, 120, 150, 180, 200, None, 220, 250],
    'Weight_kg': [50, 60, 70, 80, 90, 100, 110, 120],
    'Age': [25, 30, 35, 40, 45, 50, 55, 60]
}
df = pd.DataFrame(data)

# Replace missing values in 'Screen_Size_cm' with the most frequent value
most_frequent_value = df['Screen_Size_cm'].mode()[0]
df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)

# Replace missing values in 'Weight_kg' with the mean value
mean_value = df['Weight_kg'].mean()
df['Weight_kg'].fillna(mean_value, inplace=True)

# Additional details:
# - The `.mode()` method is used to calculate the most frequent value in a column.
# - The `[0]` indexing is used to retrieve the most frequent value from the resulting Series.
# - The `.fillna()` method is used to replace missing values with a specified value.
# - The `inplace=True` parameter is used to modify the original data frame instead of creating a new one.

# Now, use the modified 'df' data frame
print("Modified DataFrame:")
print(df)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Screen_Size_cm'].fillna(most_frequent_value, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Weight_kg'].fillna(mean_value, inplace=True)


Modified DataFrame:
   Screen_Size_cm  Weight_kg  Age
0           100.0         50   25
1           120.0         60   30
2           150.0         70   35
3           180.0         80   40
4           200.0         90   45
5           100.0        100   50
6           220.0        110   55
7           250.0        120   60


In [17]:
# Create a sample DataFrame with missing values
data = {
    'Screen_Size_cm': [100, 120, 150, 180, 200, None, 220, 250],
    'Weight_kg': [50, 60, 70, 80, 90, 100, 110, 120],
    'Age': [25, 30, 35, 40, 45, 50, 55, 60]
}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Change the data type of 'Screen_Size_cm' and 'Weight_kg' to float
df['Screen_Size_cm'] = df['Screen_Size_cm'].astype(float)
df['Weight_kg'] = df['Weight_kg'].astype(float)

# Print the modified DataFrame
print("\nModified DataFrame:")
print(df)

Original DataFrame:
   Screen_Size_cm  Weight_kg  Age
0           100.0         50   25
1           120.0         60   30
2           150.0         70   35
3           180.0         80   40
4           200.0         90   45
5             NaN        100   50
6           220.0        110   55
7           250.0        120   60

Modified DataFrame:
   Screen_Size_cm  Weight_kg  Age
0           100.0       50.0   25
1           120.0       60.0   30
2           150.0       70.0   35
3           180.0       80.0   40
4           200.0       90.0   45
5             NaN      100.0   50
6           220.0      110.0   55
7           250.0      120.0   60


In [18]:
# Create a sample DataFrame with missing values
data = {
    'Screen_Size_cm': [100, 120, 150, 180, 200, None, 220, 250],
    'Weight_kg': [50, 60, 70, 80, 90, 100, 110, 120],
    'Age': [25, 30, 35, 40, 45, 50, 55, 60]
}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Convert 'Screen_Size_cm' to inches and modify the attribute name
df['Screen_Size_inch'] = df['Screen_Size_cm'] * 2.54
df.rename(columns={'Screen_Size_cm': 'Screen_Size_inch'}, inplace=True)

# Convert 'Weight_kg' to pounds and modify the attribute name
df['Weight_pounds'] = df['Weight_kg'] * 2.20462
df.rename(columns={'Weight_kg': 'Weight_pounds'}, inplace=True)

# Print the modified DataFrame
print("\nModified DataFrame:")
print(df)

Original DataFrame:
   Screen_Size_cm  Weight_kg  Age
0           100.0         50   25
1           120.0         60   30
2           150.0         70   35
3           180.0         80   40
4           200.0         90   45
5             NaN        100   50
6           220.0        110   55
7           250.0        120   60

Modified DataFrame:
   Screen_Size_inch  Weight_pounds  Age  Screen_Size_inch  Weight_pounds
0             100.0             50   25             254.0       110.2310
1             120.0             60   30             304.8       132.2772
2             150.0             70   35             381.0       154.3234
3             180.0             80   40             457.2       176.3696
4             200.0             90   45             508.0       198.4158
5               NaN            100   50               NaN       220.4620
6             220.0            110   55             558.8       242.5082
7             250.0            120   60             635.0       264.5

In [20]:
# Create a sample DataFrame with missing values
data = {
    'CPU_frequency': [100, 120, 150, 180, 200, 220, 250, 300],
    'Memory': [8, 10, 12, 14, 16, 18, 20, 22]
}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Find the maximum CPU_frequency
max_cpu_frequency = df['CPU_frequency'].max()

# Normalize CPU_frequency to a range of 0 to 100
df['Normalized_CPU_frequency'] = (df['CPU_frequency'] - min(df['CPU_frequency']) / (max(df['CPU_frequency'] - min(df['CPU_frequency'])) * 100))

# Print the modified DataFrame
print("\nModified DataFrame:")
print(df)

Original DataFrame:
   CPU_frequency  Memory
0            100       8
1            120      10
2            150      12
3            180      14
4            200      16
5            220      18
6            250      20
7            300      22

Modified DataFrame:
   CPU_frequency  Memory  Normalized_CPU_frequency
0            100       8                    99.995
1            120      10                   119.995
2            150      12                   149.995
3            180      14                   179.995
4            200      16                   199.995
5            220      18                   219.995
6            250      20                   249.995
7            300      22                   299.995


In [21]:
# Create a sample DataFrame with missing values
data = {
    'Screen': ['100', '120', '150', '180', '200', '220', '250', '300'],
    'Memory': [8, 10, 12, 14, 16, 18, 20, 22]
}
df = pd.DataFrame(data)

# Convert 'Screen' into indicator variables
df1 = pd.get_dummies(df['Screen'], drop_first=True)

# Append df1 into the original data frame df
df = pd.concat([df, df1], axis=1)

# Drop the original attribute from the data frame df
df = df.drop('Screen', axis=1)

# Print the modified DataFrame
print("Modified DataFrame:")
print(df)

Modified DataFrame:
   Memory    120    150    180    200    220    250    300
0       8  False  False  False  False  False  False  False
1      10   True  False  False  False  False  False  False
2      12  False   True  False  False  False  False  False
3      14  False  False   True  False  False  False  False
4      16  False  False  False   True  False  False  False
5      18  False  False  False  False   True  False  False
6      20  False  False  False  False  False   True  False
7      22  False  False  False  False  False  False   True


In [23]:
# Create a sample DataFrame with missing values
data = {
    'Price': [100, 120, 150, 180, 200, 220, 250, 300],
    'Currency': ['USD', 'USD', 'EUR', 'EUR', 'EUR', 'EUR', 'EUR', 'EUR']
}
df = pd.DataFrame(data)

# Convert 'Price' from USD to Euros
df['Price_EUR'] = df['Price'] * 0.88

# Print the modified DataFrame
print("Modified DataFrame:")
print(df)

# Print the conversion rate
print("\nConversion rate: 1 USD = 0.88 EUR")

Modified DataFrame:
   Price Currency  Price_EUR
0    100      USD       88.0
1    120      USD      105.6
2    150      EUR      132.0
3    180      EUR      158.4
4    200      EUR      176.0
5    220      EUR      193.6
6    250      EUR      220.0
7    300      EUR      264.0

Conversion rate: 1 USD = 0.88 EUR


In [24]:
# Create a sample DataFrame with missing values
data = {
    'CPU_frequency': [100, 120, 150, 180, 200, 220, 250, 300],
    'Memory': [8, 10, 12, 14, 16, 18, 20, 22]
}
df = pd.DataFrame(data)

# Perform min-max normalization on CPU_frequency
df['Normalized_CPU_frequency'] = (df['CPU_frequency'] - df['CPU_frequency'].min()) / (df['CPU_frequency'].max() - df['CPU_frequency'].min())

# Print the modified DataFrame
print("Modified DataFrame:")
print(df)

# Print the normalized CPU_frequency
print("\nNormalized CPU_frequency:")
print(df['Normalized_CPU_frequency'])

# Print the range of normalized CPU_frequency
print("\nRange of normalized CPU_frequency:")
print(df['Normalized_CPU_frequency'].min(), df['Normalized_CPU_frequency'].max())


Modified DataFrame:
   CPU_frequency  Memory  Normalized_CPU_frequency
0            100       8                      0.00
1            120      10                      0.10
2            150      12                      0.25
3            180      14                      0.40
4            200      16                      0.50
5            220      18                      0.60
6            250      20                      0.75
7            300      22                      1.00

Normalized CPU_frequency:
0    0.00
1    0.10
2    0.25
3    0.40
4    0.50
5    0.60
6    0.75
7    1.00
Name: Normalized_CPU_frequency, dtype: float64

Range of normalized CPU_frequency:
0.0 1.0


## Authors


[Abhishek Gagneja](https://www.linkedin.com/in/abhishek-gagneja-23051987/)


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-12-10|0.1|Abhishek Gagneja|Initial Draft created|


Copyright © 2023 IBM Corporation. All rights reserved.
