# Introduction

Abalone is a common name for any of a group of small to very large marine gastropod molluscs in the family Haliotidae. Abalones are marine snails.

![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/33/LivingAbalone.JPG/330px-LivingAbalone.JPG)

The shells of abalones have a low, open spiral structure, and are characterized by several open respiratory pores in a row near the shell's outer edge. The thick inner layer of the shell is composed of nacre (mother-of-pearl), which in many species is highly iridescent, giving rise to a range of strong, changeable colors which make the shells attractive to humans as decorative objects, jewelry, and as a source of colorful mother-of-pearl.

The flesh of abalones is widely considered to be a desirable food, and is consumed raw or cooked by a variety of cultures.

More Information on [Abalone](https://en.wikipedia.org/wiki/Abalone)

The dataset being used is available on [UCI datasets](https://archive.ics.uci.edu/ml/datasets/Abalone)

# Solve the questions in this notebook starting from #1 and then solve the rest sequentially.

# 1
Read the dataset https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data using pandas

# 2
Check the column names

# 3
As apparent, the first row of the data becomes the column names. Import the dataset again while resolving this issue without loosing any data

# 4
Replace the default column names with
["Sex", "Length", "Diameter", "Height", "Whole_weight", "Shucked_weight", "Viscera_weight", "Shell_weight", "Rings"]

The description of the columns is given below


|Name | Data Type | Measurement Unit | Description|
|-----|-----------|----------|---|
|Sex | nominal | -- | M, F, and I (infant)|
|Length | continuous | mm | Longest shell measurement|
|Diameter | continuous | mm | perpendicular to length|
|Height | continuous | mm | with meat in shell|
|Whole weight | continuous | grams | whole abalone|
|Shucked weight | continuous | grams | weight of meat|
|Viscera weight | continuous | grams | gut weight (after bleeding)|
|Shell weight | continuous | grams | after being dried|
|Rings | integer | -- | +1.5 gives the age in years

# 5
Check the shape of the data

# 6
Calculate the desciptive properties of the data

# 7
Check the datatype of all of the columns

# 8
Check if these is any NaN value (missing) in the dataset

# 9
What is the mean height of Abablone accoring to the dataset?

# 10
Check the mode of Sex column

# 11
What is the minimum and maximum Length of Abalones in this dataset **in centimeters**?

# 12
The Sex column contains three unique values, check what they are

# 13
The unique values and their corresponding values are M for Male, F for Female, and I for Infant.

Check the count of all three values

# 14
Draw a pie chart of all of the values

# 15
Try soring the values accoring to the Height in ascending order

# 16
Sort the data accoring to Diameter in descending order

# 17
Sort the data first accoring to Length and then Height all ascending

# 18
Sort the data according to Length, Diameter and Height in the same order and all in ascending order

# 19
Sort the data first accoring to Length (ascending) and then Height (descending)

# 20
Sort the data according to Length (ascending), Diameter (descending) and Height (ascending) in the same order

# 21
On average check who weights more males or females

# 22
Check the maximum height when grouped by Sex

# 23
Check the minimum Length in each gender when grouped by Sex

# 24
From the dataframe, select only the Diameter column without using loc or iloc

# 25
From the dataframe select the Diameter and the Length columns without using loc or iloc

# 26
Select the rows from index of 10 to 15 using iloc

# 27
Select the first five rows using iloc

# 28
Select the last 7 rows using iloc

# 29
Select the row indexes 10 to 15 and column indexes 2 to 5 using iloc

# 30
Select the first five rows and the last five column indexes using iloc

# 31
Select the rows 5 to 10 using loc

# 32
Select rows 100 to 105 and columns Height to Shell_weight using loc

# 33
Select the first two rows and first two columns using loc

# 34
Select only rows with indexes 100, 150, 130 and 110 using loc

# 35
Select the columns Length, Rings and Height using loc

# 36
Select the rows 100, 150 and 99 and columns Length, Rings and Height

# 37
Draw a line plot of Whole_weight

# 38
Draw the line plot of Whole_weight and Rings (representing age) on same plot

If you did not sort the data, it is really difficult to study the relationship between Whole_weight and Rings. Let's try to fix this.

# 39
Sort the data accoring to Whole_weight and store the result in sorted_df

# 40
The new sorted dataframe will not have index in ascending order. Reset the indexes without adding any new columns.

# 41
Now draw the plot of Whole_weight again, this time using sorted_df

# 42
Now that the dataset is sorted, lets try plotting Whole_weight and Rings again but this time using sorted_df

# 43
The relation between Whole_weight and Rings is now more prominent. As Rings (age) increases, Whole_weight also increses.

But since Whole_weight and Rings are present across different scales, lets try to bring them to same scale by multiplying Whole_weight by 10 and plotting afterwards

| Column | Before Scale Factor Min | After Scaling Factor Min | Before Scale Factor Max | After Scale Factor Max |
| --- | --- | --- | --- | --- |
| Whole_weight | 0.002 | 0.02 | 2.82 | 28.2 |
| Rings | 1 | 1 | 29 | 29 |

# 44
As we have seen above, there is a clear linear relationship between columns. Lets try to explore such relationships futher using coorelation. Calculate the correlation between all the columns.

# 45
Select all the Abalones with the Length more than 0.75

# 46
Select all the Abalones with Shell_weight of 0.005 or less

# 47
Select all Abalones with Length more than 0.5 and Rings less than 7

# 48
Select all Abalones with Diamter more than 0.3 and Height more than 0.4

# 49
Select all Abalones with Rings of 29 or Shell_weight of more than 1

# 50
Create a deep copy of the dataframe to a variable called df_copy

# 51
1 milimeter = 100 decimeter

Convert all the values in **df_copy** Length, Diameter, Height to decimeter

Note: If you accidentally do this change in df instead of df_copy, you will need to re-read the dataset using pd.read_csv

# 52
Change the column names in df_copy to reflect the above change

| Old Column Name | New Column Name |
| --- | --- |
| Length | Length_dm |
| Diameter | Diameter_dm |
| Height | Height_dm |

# 53
The values in **df_copy** in columns Whole_weight, Shucked_weight, Viscera_weight and Shell_weight are given in grams, convert them into milligrams

Note: 1 gram = 1000 milligrams. Also, If you accidentally do this change in df instead of df_copy, you will need to re-read the dataset using pd.read_csv.

# 54
Rename the **df_copy** columns to reflect this change

| Old Column Name | New Column Name |
| --- | --- |
| Whole_weight | Whole_weight_mg |
| Shucked_weight | Shucked_weight_mg |
| Viscera_weight | Viscera_weight_mg |
| Shell_weight | Shell_weight_mg |

# 55
Delete the columns Sex and Rings from **df_copy**

# 56
Add another column to **df_copy** named Length_mm containing all Length_dm values in mm

Note: 100 dm = 1 mm