## Create a DataFrame

This section shows some tips to read or create a DataFrame.

### Fix Unnamed:0 When Reading a CSV in pandas

Sometimes, when reading a CSV in pandas, you will get an `Unnamed:0` column.

In [12]:
import pandas as pd  

df = pd.read_csv('data2.csv')
print(df)

   Unnamed: 0  a  b
0           0  1  4
1           1  2  5
2           2  3  6


 To fix this, add `index_col=0` to `pandas.read_csv`.

In [14]:
df = pd.read_csv('data2.csv', index_col=0)
print(df)

   a  b
0  1  4
1  2  5
2  3  6


### Read Data from a Website

pandas allows you to read data from a website without downloading the data. 

For example, to read a CSV from GitHub, click Raw then copy the link. 

![](../img/github_raw.png)

In [17]:
import pandas as pd  

df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/exercise.csv", index_col=0)

In [20]:
df.head(5)

Unnamed: 0,id,diet,pulse,time,kind
0,1,low fat,85,1 min,rest
1,1,low fat,85,15 min,rest
2,1,low fat,88,30 min,rest
3,2,low fat,90,1 min,rest
4,2,low fat,92,15 min,rest


### Read HTML Tables Using Pandas

If you want to quickly extract a table on a website and turn it into a Pandas DataFrame, use `pd.read_html`. In the code below, I extracted the table from a Wikipedia page in one line of code. 

In [7]:
import pandas as pd  

df = pd.read_html('https://en.wikipedia.org/wiki/Poverty')
df[1]

Unnamed: 0_level_0,Region,$1 per day,$1 per day,$1 per day,$1.25 per day[94],$1.25 per day[94],$1.90 per day[95],$1.90 per day[95],$1.90 per day[95],$1.90 per day[95],$1.90 per day[95],$1.90 per day[95]
Unnamed: 0_level_1,Region,1990,2002,2004,1981,2008,1981,1990,2000,2010,2015,2018
0,East Asia and Pacific,15.4%,12.3%,9.1%,77.2%,14.3%,80.2%,60.9%,34.8%,10.8%,2.1%,1.2%
1,Europe and Central Asia,3.6%,1.3%,1.0%,1.9%,0.5%,—,—,7.3%,2.4%,1.5%,1.1%
2,Latin America and the Caribbean,9.6%,9.1%,8.6%,11.9%,6.5%,13.7%,15.5%,12.7%,6%,3.7%,3.7%
3,Middle East and North Africa,2.1%,1.7%,1.5%,9.6%,2.7%,—,6.5%,3.5%,2%,4.3%,7%
4,South Asia,35.0%,33.4%,30.8%,61.1%,36%,58%,49.1%,—,26%,—,—
5,Sub-Saharan Africa,46.1%,42.6%,41.1%,51.5%,47.5%,—,54.9%,58.4%,46.6%,42.3%,40.4%
6,World,—,—,—,52.2%,22.4%,42.7%,36.2%,27.8%,16%,10.1%,—


### DataFrame.copy(): Make a Copy of a DataFrame

Have you ever tried to make a copy of a DataFrame using `=`? You will not get a copy but a reference to the original DataFrame. Thus, changing the new DataFrame will also change the original DataFrame.  

In [1]:
import pandas as pd 

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
df  

Unnamed: 0,col1,col2
0,1,4
1,2,5
2,3,6


In [8]:
df2 = df
df2['col1'] = [7, 8, 9]
df

Unnamed: 0,col1,col2
0,7,4
1,8,5
2,9,6


A better way to make a copy is to use `df.copy()`. Now, changing the copy will not affect the original DataFrame.

In [9]:
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})

# Create a copy of the original DataFrame
df3 = df.copy()

# Change the value of the copy
df3['col1'] = [7, 8, 9]

# Check if the original DataFrame has been changed
df

Unnamed: 0,col1,col2
0,1,4
1,2,5
2,3,6
