### Reading CSV Files with Encoding

1. Import the pandas library

In [1]:
import pandas as pd

Syntax for importing files with encoding using pseudocode:

`df = pd.read_csv('filename.csv', encoding='some_encoding')`

2. Use the `pandas.read_csv()` function to read the `laptops.csv` file into a dataframe `laptops`.
- Specify the encoding using the string `"Latin-1"`.

In [2]:
laptops = pd.read_csv("laptops.csv", encoding="Latin-1")

3. Use the `DataFrame.info()` method to display information about the `laptops` dataframe.

In [3]:
laptops.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Manufacturer              1303 non-null   object
 1   Model Name                1303 non-null   object
 2   Category                  1303 non-null   object
 3   Screen Size               1303 non-null   object
 4   Screen                    1303 non-null   object
 5   CPU                       1303 non-null   object
 6   RAM                       1303 non-null   object
 7    Storage                  1303 non-null   object
 8   GPU                       1303 non-null   object
 9   Operating System          1303 non-null   object
 10  Operating System Version  1133 non-null   object
 11  Weight                    1303 non-null   object
 12  Price (Euros)             1303 non-null   object
dtypes: object(13)
memory usage: 132.5+ KB


### Cleaning Column Names

We can access the column acis of a dataframe using the `Dataframe.columns` attribute; this returns as an index object -- a special type of NumPy ndarray -- with the labels of each column:

In [4]:
laptops.columns

Index(['Manufacturer', 'Model Name', 'Category', 'Screen Size', 'Screen',
       'CPU', 'RAM', ' Storage', 'GPU', 'Operating System',
       'Operating System Version', 'Weight', 'Price (Euros)'],
      dtype='object')

We can use the attribute to assign new labels to the columns:

In [5]:
laptops_test = laptops.copy()
laptops_test.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
                       'K', 'L', 'M']

laptops_test.columns

Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M'], dtype='object')

1. Remove any whitespace from the start and end of each column name.
- Create an empty list named `new_columns`.
- Use a for loop to iterate through each column name using the `DataFrame.columns` attribute. Inside the body of the for loop:
    - Use the `str.strip()` method to remove whitespace from the start and end of the string.
    - Append the updated column name to the `new_columns` list.
- Assign the updated column names to the `DataFrame.columns` attribute.

In [6]:
new_columns = []

for column in laptops.columns:
    column = column.strip()
    new_columns.append(column)
    
laptops.columns = new_columns

In [7]:
laptops.columns

Index(['Manufacturer', 'Model Name', 'Category', 'Screen Size', 'Screen',
       'CPU', 'RAM', 'Storage', 'GPU', 'Operating System',
       'Operating System Version', 'Weight', 'Price (Euros)'],
      dtype='object')

### Cleaning Column Names Continued

We removed the whitespaces from the column names but the labels still have a variety of upper and lowercase letters, parentheses, etc.

We should finish cleaning our column labels by:
- Replacing spaces with underscores
- Removing special characters
- Making all labels lowercase
- Shortening any long column names

In [8]:
# def clean_col(col):
#     col = col.strip()
#     col = col.replace("(", "")
#     col = col.replace(")", "")
#     col = col.lower()
#     return col

# new_columns = []
# for c in laptops.columns:
#     clean_c = clean_col(c)
#     new_columns.append(clean_c)
    
# laptops.columns = new_columns

# laptops.columns

1. Define a function which accepts a string argument and:
- Removes any whitespace fromt he start and end of the string.
- Replaces the substring `Operating System` with the abbreviation `os`.
- Replaces all spaces with underscores.
- Removes parentheses from the string.
- Makes the entire string lowercase.
- Returns the modified string.

In [9]:
def clean_col(col):
    col = col.strip()
    col = col.replace("Operating System", "os")
    col = col.replace("(", "")
    col = col.replace(")", "")
    col = col.replace(" ", "_")
    col = col.lower()
    return col

2. Use a loop to apply the function to each item in the `DataFrame.columns` attribute for the `laptops` dataframe. Assign the result back to the `DataFrame.columns` attribute.

In [10]:
new_columns = []

for c in laptops.columns:
    clean_c = clean_col(c)
    new_columns.append(clean_c)
    
laptops.columns = new_columns
laptops.columns

Index(['manufacturer', 'model_name', 'category', 'screen_size', 'screen',
       'cpu', 'ram', 'storage', 'gpu', 'os', 'os_version', 'weight',
       'price_euros'],
      dtype='object')

### Converting String Columns to Numeric

In [11]:
laptops.iloc[:5, 2:5]

Unnamed: 0,category,screen_size,screen
0,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600
1,Ultrabook,"13.3""",1440x900
2,Notebook,"15.6""",Full HD 1920x1080
3,Ultrabook,"15.4""",IPS Panel Retina Display 2880x1800
4,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600


- `category`: purely text data - there are no numeric values
- `screen_size`: numeric data stored as text data because of the `"` character
- `screen`: a combination of pure text data with numeric data

When converting text data to numeric data, we can follow this data cleaning workflow:
- Explore the data in the column
- Identify patterns and special cases
- Remove non-digit characters
- Convert the column to a numeric dtype
- Rename column if required

**Explore the Data**

One of the best ways to do this is to use the `Series.unique()` method to view all of the unique values in the column:

In [12]:
laptops["screen_size"].dtype

dtype('O')

In [13]:
laptops["screen_size"].unique()

array(['13.3"', '15.6"', '15.4"', '14.0"', '12.0"', '11.6"', '17.3"',
       '10.1"', '13.5"', '12.5"', '13.0"', '18.4"', '13.9"', '12.3"',
       '17.0"', '15.0"', '14.1"', '11.3"'], dtype=object)

**Identify Patterns and Special Cases**

Observe the following:
- All values in this column follow the same pattern - a series of digit and period characters, followed by a quote character (`"`).
- There are no special cases. Every value matches the same pattern.
- We'll need to convert the column to a `float` dtype, as the `int` dtype won't be able to store the decimal values.

1. Use the `Series.unique()` method to identify the unique values in the `ram` column of the `laptops` dataframe. Assign the result to `unique_ram`.

In [14]:
unique_ram = laptops["ram"].unique()

2. After running your code, use the variable inspector to view the unique values in the `ram` column and identify any patterns.

In [15]:
unique_ram

array(['8GB', '16GB', '4GB', '2GB', '12GB', '6GB', '32GB', '24GB', '64GB'],
      dtype=object)

### Removing Non-Digit Characters

**Remove the non-digit-characters**

All the values in the `ram` and `screen_size` columns are numeric types that feature non-digit characters that need to be removed.

The pandas library contains dozens of vectorized string methods we can use to manipulate text data, many of which perform the same operations as Python string methods. Most vectorized string methods are available using the `Series.str` accessor, which means we can access them by adding `str` between the series name and the method name:

`Series.str.method_name()`

We can use the `Series.str.replace()` method, which is a vectorized version of the Python `str.replace()` method, to remove all the quote characters from every string in the `screen_size` column:

In [16]:
laptops["screen_size"] = laptops["screen_size"].str.replace('"', '')
laptops["screen_size"].unique()

array(['13.3', '15.6', '15.4', '14.0', '12.0', '11.6', '17.3', '10.1',
       '13.5', '12.5', '13.0', '18.4', '13.9', '12.3', '17.0', '15.0',
       '14.1', '11.3'], dtype=object)

1. Use the `Series.str.replace()` method to remove the substring `GB` from the `ram` column.

In [17]:
laptops["ram"] = laptops["ram"].str.replace("GB", "")

2. Use the `Series.unique()` method to assign the unique values in the `ram` column to `unique_ram`.

In [18]:
unique_ram = laptops["ram"].unique()
unique_ram

array(['8', '16', '4', '2', '12', '6', '32', '24', '64'], dtype=object)

### Converting Columns to Numeric Dtypes

**Convert (or cast) the columns to a numeric dtype**

To do this, we use the `Series.astype()` method. To convert the column to a numeric dtype, we can use either `int` or `float` as the parameter for the method. Since the `int` dtype can't store decimal values, we'll convert tthe `screen_size` column to the `float` dtype:

In [19]:
laptops["screen_size"] = laptops["screen_size"].astype(float)
laptops["screen_size"].dtype

dtype('float64')

In [20]:
laptops["screen_size"].unique()

array([13.3, 15.6, 15.4, 14. , 12. , 11.6, 17.3, 10.1, 13.5, 12.5, 13. ,
       18.4, 13.9, 12.3, 17. , 15. , 14.1, 11.3])

1. Use the `Series.astype()` method to change the `ram` column to an `integer` dtype.

In [21]:
laptops["ram"] = laptops["ram"].astype(int)

2. Use the `DataFrame.dtypes` attribute to get a list of the column names and types from the `laptops` dataframe. Assign the result to `dtypes`.

In [22]:
dtypes = laptops.dtypes
dtypes

manufacturer     object
model_name       object
category         object
screen_size     float64
screen           object
cpu              object
ram               int64
storage          object
gpu              object
os               object
os_version       object
weight           object
price_euros      object
dtype: object

### Renaming Columns

**Rename the column(s)**

This is an option stupe and can be useful if the non-digit values contain information that helps us understand the data.

The quote characters we removed from the `screen_size` column denoted that the screen size was in inches. To stop us from losing information that helps us understand the data, we can use the `DataFrame.rename()` method to rename the column from `screen_size` to `screen_size_inches`.

We specify the `axis=1` parameter so pandas knows that we want to rename labels in the column axis:

In [23]:
laptops.rename({"screen_size" : "screen_size_inches"}, axis=1, inplace=True)

`inplace=True` assigns the results back to the dataframe without having to specify the variable it's being assigned to

In [24]:
laptops.dtypes

manufacturer           object
model_name             object
category               object
screen_size_inches    float64
screen                 object
cpu                    object
ram                     int64
storage                object
gpu                    object
os                     object
os_version             object
weight                 object
price_euros            object
dtype: object

1. Because the `GB` characters contianed useful information about the units (gigabytes) of the laptop's ram, use the `DataFrame.rename()` method to rename the column from `ram` to `ram_gb`.

In [25]:
laptops.rename({"ram" : "ram_gb"}, axis=1, inplace=True)

2. Use the `Series.describe()` method to return a series of descriptive statistics for the `ram_gb` column. Assign the result to `ram_gb_desc`.

In [26]:
ram_gb_desc = laptops["ram_gb"].describe()
ram_gb_desc

count    1303.000000
mean        8.382195
std         5.084665
min         2.000000
25%         4.000000
50%         8.000000
75%         8.000000
max        64.000000
Name: ram_gb, dtype: float64

### Extracting Values from Strings

Sometimes it can be useful to extract non-numeric values from within strings.

In [27]:
laptops["gpu"].head()

0    Intel Iris Plus Graphics 640
1          Intel HD Graphics 6000
2           Intel HD Graphics 620
3              AMD Radeon Pro 455
4    Intel Iris Plus Graphics 650
Name: gpu, dtype: object

The information in this column seems to be a manufacturer (Intel, AMD) followed by a model name/number. Let's extract the manufacturer by itself so we can find the most common ones.

Because each manufacturer is followed by a whitespace character, we can use the `Series.str.split()` method to extract this data:

In [28]:
(laptops["gpu"]
    .head()
    .str.split()
)

0    [Intel, Iris, Plus, Graphics, 640]
1           [Intel, HD, Graphics, 6000]
2            [Intel, HD, Graphics, 620]
3               [AMD, Radeon, Pro, 455]
4    [Intel, Iris, Plus, Graphics, 650]
Name: gpu, dtype: object

This method splits each string on the whitespace; the result is a series containing individual Python lists. Also note that we used paretheses to method chain over multiple lines, which makes our code easier to read.

Just like with lists and ndarrays, we can use bracket notation to access the elements in each list in the series. With series, however, we use the `str` accessor followed by `[]` (brackets).

In [29]:
laptops["gpu"].head().str.split().str[0]

0    Intel
1    Intel
2    Intel
3      AMD
4    Intel
Name: gpu, dtype: object

In [30]:
laptops["gpu_manufacturer"] = (laptops["gpu"]
                                  .str.split()
                                  .str[0]
                              )

In [31]:
laptops.head()

Unnamed: 0,manufacturer,model_name,category,screen_size_inches,screen,cpu,ram_gb,storage,gpu,os,os_version,weight,price_euros,gpu_manufacturer
0,Apple,MacBook Pro,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128GB SSD,Intel Iris Plus Graphics 640,macOS,,1.37kg,133969,Intel
1,Apple,Macbook Air,Ultrabook,13.3,1440x900,Intel Core i5 1.8GHz,8,128GB Flash Storage,Intel HD Graphics 6000,macOS,,1.34kg,89894,Intel
2,HP,250 G6,Notebook,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256GB SSD,Intel HD Graphics 620,No OS,,1.86kg,57500,Intel
3,Apple,MacBook Pro,Ultrabook,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512GB SSD,AMD Radeon Pro 455,macOS,,1.83kg,253745,AMD
4,Apple,MacBook Pro,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256GB SSD,Intel Iris Plus Graphics 650,macOS,,1.37kg,180360,Intel


In the example code, we have extracted the manufacturer name from the `gpu` column, and assigned it to a new column `gpu_manufacturer`.
1. Extract the manufacturer name from the `cpu` column. Assign it to a new column `cpu_manufacturer`.

In [32]:
laptops["cpu_manufacturer"] = (laptops["cpu"]
                                    .str.split()
                                    .str[0]
                              )

2. Use the `Series.value_counts()` method to find the counts of each manufacturer in `cpu_manufacturer`. Assign the result to `cpu_manufacturer_counts`.

In [33]:
cpu_manufacturer_counts = laptops["cpu_manufacturer"].value_counts()
cpu_manufacturer_counts

Intel      1240
AMD          62
Samsung       1
Name: cpu_manufacturer, dtype: int64

### Correcting Bad Values

If your data has been scraped from a webpage or if there was manual data entry involved at some point, you may end up with inconsistent values.

In [34]:
laptops["os"].value_counts()

Windows      1125
No OS          66
Linux          62
Chrome OS      27
macOS          13
Mac OS          8
Android         2
Name: os, dtype: int64

We can see that there are two variations of the Apple operating system -- macOS -- in our dataset: `Mac OS` and `macOS`. One way we can fix this is with the `Series.map()` method. The `Series.map()` method is ideal when we want to change multiple values in a column.

The most common way to use `Series.map()` is with a dictionary.

_Note_: One important thing to remember with `Series.map()` is that if a value from your series doesn't exist as a key in your dictionary, it will convert that value to `NaN`. In Jupyter Notebooks, this is very easily fixed by re-running cells.

We have created a dictionary for you to use with mapping. Note that we have included both the correct and inccorect spelling of macOS as keys, otherwise we'll end up with null values.

1. Use the `Series.map()` method with the `mapping_dict` dictionary to correct the values in the `os` column.

In [36]:
mapping_dict = {
    'Android': 'Android',
    'Chrome OS': 'Chrome OS',
    'Linux': 'Linux',
    'Mac OS': 'macOS',
    'No OS': 'No OS',
    'Windows': 'Windows',
    'macOS': 'macOS'
}

laptops["os"] = laptops["os"].map(mapping_dict)
laptops.head()

Unnamed: 0,manufacturer,model_name,category,screen_size_inches,screen,cpu,ram_gb,storage,gpu,os,os_version,weight,price_euros,gpu_manufacturer,cpu_manufacturer
0,Apple,MacBook Pro,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128GB SSD,Intel Iris Plus Graphics 640,macOS,,1.37kg,133969,Intel,Intel
1,Apple,Macbook Air,Ultrabook,13.3,1440x900,Intel Core i5 1.8GHz,8,128GB Flash Storage,Intel HD Graphics 6000,macOS,,1.34kg,89894,Intel,Intel
2,HP,250 G6,Notebook,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256GB SSD,Intel HD Graphics 620,No OS,,1.86kg,57500,Intel,Intel
3,Apple,MacBook Pro,Ultrabook,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512GB SSD,AMD Radeon Pro 455,macOS,,1.83kg,253745,AMD,Intel
4,Apple,MacBook Pro,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256GB SSD,Intel Iris Plus Graphics 650,macOS,,1.37kg,180360,Intel,Intel


### Dropping Missing Values

In pandas, null values will be indicated by either `NaN` or `None`.

Recall that we can use `DataFrame.isnull()` method to identify missing values, which returns a boolean dataframe. We can then use the `DataFrame.sum()` method to give us a count of the `True` values for each column:

In [37]:
laptops.isnull().sum()

manufacturer            0
model_name              0
category                0
screen_size_inches      0
screen                  0
cpu                     0
ram_gb                  0
storage                 0
gpu                     0
os                      0
os_version            170
weight                  0
price_euros             0
gpu_manufacturer        0
cpu_manufacturer        0
dtype: int64

We have only one column with null values, `os_version`, which has 170 missing values.

There are a few options for handling missing values:
- Remove any rows that have missing values.
- Remove any columns that have missing values.
- Fill the missing values with some other value.
- Leave the missing values as is.

The first two options are often used to prepare data for machine learning algorithms, which are unable to be used with data that includes null values. We can use the `DataFrame.dropna()` method to remove or **drop** rows and columns with null values.

The `DataFrame.dropna()` method accepts an `axis` parameter, which indicates whether we want to drop along the column or index axis.

The default value for the `axis` parameter is `0`, so `df.dropna()` returns an identical result to `df.dropna(axis=0)`, dropping all rows with a null value.

The use of `axis=1` will drop any columns with a null value.

1. Use `DataFrame.dropna()` to remove any rows from the laptops dataframe that have null values. Assign the result to `laptops_no_null_rows`.

In [40]:
laptops_no_null_rows = laptops.dropna(axis=0)
laptops_no_null_rows.shape

(1133, 15)

2. Use `DataFrame.dropna()` to remove any columns from the laptops dataframe that have null values. Assign the result to `laptops_no_null_cols`.

In [41]:
laptops_no_null_cols = laptops.dropna(axis=1)
laptops_no_null_cols.shape

(1303, 14)

### Filling Missing Values

While dropping rows or columns is the easiest approach to deal with missing values, it may not always be the _best_ approach. For example, removing a disproportionate amount of one manufacturer's laptops could change our analysis.

Because of this, it's a good idea to explore the missing values in the `os_version` column before making a decision. We can use `Series.value_counts()` to explore all the values in the column, but we'll use a new parameter.

In [42]:
laptops["os_version"].value_counts(dropna=False)

10      1072
NaN      170
7         45
10 S       8
X          8
Name: os_version, dtype: int64

Because we set the `dropna` parameter to `False`, the result includes null values. We can see that the majority of values int he column are `10` and missing values are the next most common.

Let's also explore the `os` column, since it's closely related to the `os_version` column. We'll only look at rows in which the `os_version` is missing:

In [44]:
os_with_null_v = laptops.loc[laptops["os_version"].isnull(), "os"]
os_with_null_v.value_counts()

No OS        66
Linux        62
Chrome OS    27
macOS        13
Android       2
Name: os, dtype: int64

We can observe a few things:
- The most frequent value is "No OS". This is important to note because if there is no os, there _shouldn't_ be a version defined in the `os_version` column.
- Thirteen of the laptops that come with macOS do not specify the version. We can use our knowledge of macOS to comfirm that `os_version` should be equal to `X`.

In both cases, we can fill the missing values to make our data more correct. For the rest of the values, it's probably best to leave them as missing so we don't remove important values.

We can use assignment with a boolean comparison to perform this replacement, like below:

In [47]:
laptops.loc[laptops["os"] == "macOS", "os_version"] = "X"
laptops.head()

Unnamed: 0,manufacturer,model_name,category,screen_size_inches,screen,cpu,ram_gb,storage,gpu,os,os_version,weight,price_euros,gpu_manufacturer,cpu_manufacturer
0,Apple,MacBook Pro,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128GB SSD,Intel Iris Plus Graphics 640,macOS,X,1.37kg,133969,Intel,Intel
1,Apple,Macbook Air,Ultrabook,13.3,1440x900,Intel Core i5 1.8GHz,8,128GB Flash Storage,Intel HD Graphics 6000,macOS,X,1.34kg,89894,Intel,Intel
2,HP,250 G6,Notebook,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256GB SSD,Intel HD Graphics 620,No OS,,1.86kg,57500,Intel,Intel
3,Apple,MacBook Pro,Ultrabook,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512GB SSD,AMD Radeon Pro 455,macOS,X,1.83kg,253745,AMD,Intel
4,Apple,MacBook Pro,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256GB SSD,Intel Iris Plus Graphics 650,macOS,X,1.37kg,180360,Intel,Intel


For rows with `No OS` values, let's replace the missing value in the `os_version` column with the value `Version Unknown`.

1. Use a boolean array to identify rows that have the value `No OS` for the `os` column. Then, use assignment to assign the value `Version Unknown` to the `os_version` column for those rows.

In [48]:
value_counts_before = laptops.loc[laptops["os_version"].isnull(), "os"].value_counts()

laptops.loc[laptops["os"] == "No OS", "os_version"] = "Version Unknown"

2. Use the syntax below to create `value_counts_after` variable:
`value_counts_after = laptops.loc[laptops["os_version"].isnull(), "os"].value_counts()`

In [49]:
value_counts_after = laptops.loc[laptops["os_version"].isnull(), "os"].value_counts()

3. After running your code, check the difference between `value_counts_before` and `value_counts_after`.

In [50]:
value_counts_before

No OS        66
Linux        62
Chrome OS    27
Android       2
Name: os, dtype: int64

In [51]:
value_counts_after

Linux        62
Chrome OS    27
Android       2
Name: os, dtype: int64

### Challenge: Clean a String Column

Clean the `weight` column.

In [52]:
laptops["weight"].head()

0    1.37kg
1    1.34kg
2    1.86kg
3    1.83kg
4    1.37kg
Name: weight, dtype: object

While it appears that the `weight` column may just need the `kg` characters removed from the end of each string, there is one special case - one of the values ends with `kgs` so you'll have to remove both `kg` and `kgs` characters.

In the last step of this challenge, we'll also ask you to use the `DataFrame.to_csv()` method to save the cleaned data to a CSV file. It's a good idea to save a CSV when you finish cleaning in case you wish to do analysis later.

**Syntax:**

`df.to_csv('filename.csv', index=False)`

Be default, pandas will save the index labels as a column in the CSV file. Our dataset has integer labels that don't contain any data, so we don't need to save the index.

1. Convert the values in the `weight` column to numeric values.

In [53]:
laptops["weight"] = laptops["weight"].str.replace("kgs", "")
laptops["weight"] = laptops["weight"].str.replace("kg", "")

laptops["weight"] = laptops["weight"].astype(float)

2. Rename the `weight` column to `weight_kg`.

In [54]:
laptops.rename({"weight" : "weight_kg"}, axis=1, inplace=True)

laptops.head()

Unnamed: 0,manufacturer,model_name,category,screen_size_inches,screen,cpu,ram_gb,storage,gpu,os,os_version,weight_kg,price_euros,gpu_manufacturer,cpu_manufacturer
0,Apple,MacBook Pro,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128GB SSD,Intel Iris Plus Graphics 640,macOS,X,1.37,133969,Intel,Intel
1,Apple,Macbook Air,Ultrabook,13.3,1440x900,Intel Core i5 1.8GHz,8,128GB Flash Storage,Intel HD Graphics 6000,macOS,X,1.34,89894,Intel,Intel
2,HP,250 G6,Notebook,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256GB SSD,Intel HD Graphics 620,No OS,Version Unknown,1.86,57500,Intel,Intel
3,Apple,MacBook Pro,Ultrabook,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512GB SSD,AMD Radeon Pro 455,macOS,X,1.83,253745,AMD,Intel
4,Apple,MacBook Pro,Ultrabook,13.3,IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256GB SSD,Intel Iris Plus Graphics 650,macOS,X,1.37,180360,Intel,Intel


3. Use the `DataFrame.to_csv()` method to save the laptops dataframe to a CSV file `laptops_cleaned.csv()` _without_ index labels.

In [55]:
laptops.to_csv('laptops_cleaned.csv', index=False)

### Extra Steps:

1. Convert the `price_euros` column to a numeric dtype.

2. Extract the screen resolution from the `screen` column.

3. Extract the processor speed from the `cpu` column.