# Pandas 2

**[1] Basic operations**<br>
- Series: Add (or delete) a value<br>
- DataFrame: Add (or delete) a column (or row)<br>
- Copy a DataFrame<br>
- Sort a DataFrame<br>

**[2] Data consolidation**<br>
- Concatenate<br>
- Merge<br>

In [None]:
import pandas as pd

## [1] Basic operations

### [1.1] Series: Add (or delete) a value

In [None]:
s = pd.Series([10, 20, 30, 40])
s

- **Add a new value**<br>

The <code>append()</code> method has been deprecated since version 1.4.0. Use <code>concat()</code> instead. 

In [None]:
# Check your pandas version
pd.__version__

- **Delete a value**

In [None]:
# Delete the value at the given index.
s.drop(2)

In [None]:
#By default, the parameter "inplace" of the drop method is set to "False". In this case, the series remains the same.
s

### [1.2] DataFrame: Add (or delete) a column (or row)

In [None]:
df = pd.DataFrame({"col1":[10, 20, 30, 40], "col2":["A","B","C","D"]})
df

- **Add a new column**

In [None]:
df["col3"] = [1.5, 2.5, 3.5, 4.5]
df

- **Add a new row**<br>

The <code>append()</code> method has been deprecated since version 1.4.0. Use <code>concat()</code> instead. 

- **Delete a column**

In [None]:
df.drop(["col3"], axis = 1)  # same as df.drop(["col3"], axis = "columns")

- **Delete a row**

In [None]:
df.drop([3], axis = 0) # same as df.drop([3], axis = "index")

### [1.3] Create a copy of a DataFrame

In [None]:
df1 = pd.DataFrame({"col1":[10, 20, 30, 40], "col2":["A","B","C","D"]})
df1

In [None]:
# Make a copy of df1
df2 = df1.copy()

# Add a new column to df2
df2["col3"] = [1.5, 2.5, 3.5, 4.5]

# Changes on df2 do not affect df1
display(df1) 
display(df2)

## Exercise.A

**(A.1) Given a dataframe. Create a copy named <code>company_df</code> and use it to do A.2~A.4.**

In [None]:
company_raw_df = pd.DataFrame({"company_name":['JPMorgan Chase','Apple','Bank of America','Amazon','Microsoft'],
                            "profit":[40.4, 63.9, 17.9, 21.3, 51.3],
                            "assets":[3689.3, 354.1, 2832.2, 321.2, 304.1]})

**(A.2) Add a new column named <code>market_value</code> with a list of values <code>464.8, 2252.3, 336.3, 1711.8, 1966.6</code>.**

**(A.3) Drop the column <code>assets</code>.** (Use inplace = True.)

### [1.4] Sort a DataFrame

In [None]:
df = pd.DataFrame({"state": ["Ohio","Ohio","Ohio","Nevada","Nevada","Nevada"],
                    "year":[2000,2001,2002,2001,2002,2003],
                    "pop":[1.5,1.7,3.6,2.4,2.9,3.2]})
df

- **Sort a DataFrame by one column**

In [None]:
df.sort_values(by = "year")

- **Sort a DataFrame in descending order**

In [None]:
df.sort_values(by = "year", ascending = False)

- **Sort a DataFrame by multiple columns**

In [None]:
df.sort_values(by = ["state", "year"])

In [None]:
df.sort_values(by = ["state", "year"], ascending = [True, False])

- **Sort a DataFrame (inplace = True)**

In [None]:
df.sort_values(by = "year", ascending = False, inplace = True)
df

- **Reset index after sorting**<br>

In [None]:
# Drop the old index
df.reset_index(drop = True)

In [None]:
# Add the old index as an additional column to your DataFrame
df.reset_index(drop = False)

## Exercise.B

**(B.1) Use the dataframe <code>company_raw_df</code> in (A.1). Sort the dataframe by the <code>profit</code> column in a descending order and display the result.**

**(B.2) Store the returned result in (B.1) in a new variable named <code>company_sorted_df</code>.**

**(B.3) Reset the index of <code>company_sorted_df</code> and drop the old index.**

## [2] Combining DataFrames

### [2.1] Concatenation

- **Concatenate series (by default, axis = 0)**

In [None]:
s1 = pd.Series([0,1], index = ['a','b'])
s2 = pd.Series([2,3,4], index = ['c','d','e'])
s3 = pd.Series([5,6], index = ['f','g'])

In [None]:
# Concatenate three series into one list
pd.concat([s1,s2,s3])

- **Concatenate series (axis = 1)**

In [None]:
# Concatenate series along the columns 
pd.concat([s1,s2,s3], axis=1)

- **Concatenate series with the same index.**

In [None]:
s4 = pd.Series([0,1,2], index = ['a','b','c'])
s5 = pd.Series([3,4,5], index = ['a','b','c'])
s6 = pd.Series([6,7,8], index = ['a','b','c'])

In [None]:
pd.concat([s4,s5,s6], axis=1)

- **Concatenating DataFrame (by default, axis = 0)**

In [None]:
df1 = pd.DataFrame({"col1":[1,2,3],"col2":[4,5,6],"col3":[7,8,9]}, index = ['a','b','c'])
df2 = pd.DataFrame({"col1":[11,22,33],"col2":[44,55,66],"col3":[77,88,99]},index = ['a','b','c'])
display(df1)
display(df2)

In [None]:
pd.concat([df1,df2])

In [None]:
pd.concat([df1,df2], ignore_index = True)

- **Concatenating DataFrame (axis = 1)**

In [None]:
pd.concat([df1,df2], axis = 1)

## Exercise.C

**(C.1) Import the datasets <code>municipality_info_part1.csv</code> and <code>municipality_info_part2.csv</code> as dataframes. The columns in the two datasets are described as follows. Display the first five rows of each dataset.**
- Municipality_number (object)
- Population (int)
- Area (float)

Note: Use the argument "dtype" to specify the data types.<br>
<code>dtype = {"Municipality_number": object, "Population": int,  "Area": float} </code>.

**(C.2) How many rows are there in each dataframe?**

**(C.3) Concatenate two dataframes in (C.1) along the rows and assign the returned dataframe to a new variable named <code>mcp_info</code>.**

**(C.4) How many rows are in the dataframe <code>mcp_info</code>?**

### [2.2] Merge

In [None]:
df1 = pd.DataFrame({'employID':['E011','E012','E013','E014','E015','E016','E017'], 
                    'name':['John','Diana','Matthew','Jerry','Kathy','Sara','Alex']})
df2 = pd.DataFrame({'employID':['E010','E012','E013','E015','E016','E017'], 
                    'birthday':['20-07','12-06','18-01','16-05','02-10','19-08']})

display(df1)
display(df2)

- **Left join**

In [None]:
pd.merge(df1,df2, how = 'left', on = 'employID' )

- **Inner join**

In [None]:
pd.merge(df1,df2, how = 'inner', on = 'employID' )

- **Outer join**

In [None]:
pd.merge(df1,df2, how = 'outer', on = 'employID' )

## Exercise.D

**(D.1) Import the dataset <code>municipality_name.csv</code> as a dataframe named<code>mcp_name</code>. The columns in the dataset are described as follows.**<br>
- Municipality_number (object)
- Municipality_name (object)

Hint: Use the argument <code>encoding = "iso8859_10"</code> to specify the character encoding. 


**(D.2) The dataframe <code>mcp_info</code> obtained in (C.3) does not contain the information "municipality_name". Find "municipality_name" from the dataframe <code>mcp_name</code>, add "municipality_name" as a new column in <code>mcp_info</code>.** <br>

Expected result:

||Municipality_number|Population|Area|Municipality_name|
|--:|--:|--:|--:|--:|
|**0**|0301|673469|454.03|OSLO|
|**1**|1101|14898|431.66|EIGERSUND|
|**2**|...|...|...|...|

**(D.3) List the five most populous municipalities.**