# Introduction to DataFrames

### 1. Load Julia packages

In [3]:
using DataFrames,CSV,Statistics

### 2. Construct a DataFrame from scratch
The following data table would constain information about cities in China. 

 - rows = instances = cities
 - columns = features = attributes of cities

In [4]:
cities = ["Hangzhou","Ningbo","Wenzhou"];
populations = [2000,1500,1000];

Now we use DataFrame to construct the above information.

In [6]:
df_cities = DataFrame(cities = cities,population = populations)

Unnamed: 0_level_0,cities,population
Unnamed: 0_level_1,String,Int64
1,Hangzhou,2000
2,Ningbo,1500
3,Wenzhou,1000


How to **append** rows to a DataFrame
Suppose we want to append the following information.

Cixi,has a population of 200. 

Lishui,has a population of 500.

 - Approach 1: think of rows of a DataFrame as an ***array***.

In [7]:
push!(df_cities,["Cixi",200])

Unnamed: 0_level_0,cities,population
Unnamed: 0_level_1,String,Int64
1,Hangzhou,2000
2,Ningbo,1500
3,Wenzhou,1000
4,Cixi,200


 - Approach 2: think of rows of a DataFrame as a Dictionary

In [11]:
new_row = Dict(:cities => "Lishui", :population => 500)

┌ Error: Error adding value to column :cities.
└ @ DataFrames C:\Users\Richard Xin Gu\.julia\packages\DataFrames\6xBiG\src\dataframe\dataframe.jl:1562


Dict{Symbol, Any} with 2 entries:
  :cities     => "Lishui"
  :population => 500

In [12]:
push!(df_cities,new_row)

Unnamed: 0_level_0,cities,population
Unnamed: 0_level_1,String,Int64
1,Hangzhou,2000
2,Ningbo,1500
3,Wenzhou,1000
4,Cixi,200
5,Lishui,500


How to **append columns** to a DataFrame

The information is given as follows. 

annual ranfall[inches]:

 - Hangzhou:51.0
 - Ningbo: 62.0
 - Wenzhou: 65.0
 - Cixi: 62.5
 - Lishui: 50.0

In [13]:
df_cities[!,:rainfall] = [51.0,62.0,65.0,62.5,50.0]

5-element Vector{Float64}:
 51.0
 62.0
 65.0
 62.5
 50.0

In [14]:
df_cities

Unnamed: 0_level_0,cities,population,rainfall
Unnamed: 0_level_1,String,Int64,Float64
1,Hangzhou,2000,51.0
2,Ningbo,1500,62.0
3,Wenzhou,1000,65.0
4,Cixi,200,62.5
5,Lishui,500,50.0


Another way to insert column by using DataFrames.insertcols!

We want to insert in the second column about where those cities are.

In [21]:
insertcols!(df_cities,2,:provinces => ["ZJ","ZJ","ZJ","ZJ","ZJ"])

Unnamed: 0_level_0,cities,provinces,population,rainfall
Unnamed: 0_level_1,String,String,Int64,Float64
1,Hangzhou,ZJ,2000,51.0
2,Ningbo,ZJ,1500,62.0
3,Wenzhou,ZJ,1000,65.0
4,Cixi,ZJ,200,62.5
5,Lishui,ZJ,500,50.0


Check how many rows/columns in the DataFrame

In [15]:
size(df_cities)

(5, 3)

Rename a column

use : indicates that it is a Symbol.

In [22]:
rename!(df_cities,:rainfall => :rain)

Unnamed: 0_level_0,cities,provinces,population,rain
Unnamed: 0_level_1,String,String,Int64,Float64
1,Hangzhou,ZJ,2000,51.0
2,Ningbo,ZJ,1500,62.0
3,Wenzhou,ZJ,1000,65.0
4,Cixi,ZJ,200,62.5
5,Lishui,ZJ,500,50.0


How to delete a row

In [23]:
push!(df_cities,["Leqing","ZJ",100,40.0])

Unnamed: 0_level_0,cities,provinces,population,rain
Unnamed: 0_level_1,String,String,Int64,Float64
1,Hangzhou,ZJ,2000,51.0
2,Ningbo,ZJ,1500,62.0
3,Wenzhou,ZJ,1000,65.0
4,Cixi,ZJ,200,62.5
5,Lishui,ZJ,500,50.0
6,Leqing,ZJ,100,40.0
