# Data Frame

**A Data Frame** is a **2D data structure** in which data is aligned in a tabular fashion consisting of **rows and columns**

A Data Frame can be created using the following constructor:

df = pandas.DataFrame(data, index, dtype, copy)

---

DataFrame accepts many different kinds of input:

*   Dict of 1D ndarrays, lists, dicts, or Series
*   2-D numpy.ndarray
*   Structured or record ndarray
*   A Series
*   Another DataFrame

Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments.

---



**Creating a Data Frame using List**

---

Converting **list** into **Data Frame**

In [0]:
import numpy as np
import pandas as pd

**Create DataFrame using List in Python**

---




In [3]:
listx = [10, 20, 30, 40, 50]

table = pd.DataFrame(listx)

print (table)

    0
0  10
1  20
2  30
3  40
4  50


**Creating a Data Frame from a list of dictionary**

---


In [15]:
data_list = [{'a':10, 'b':20},{'a':50, 'b':70},{'a':310, 'b':40}]

table = pd.DataFrame(data_list, index=('row-1','row-2','row-3'))

print(table)

         a   b
row-1   10  20
row-2   50  70
row-3  310  40


In [18]:
data_list = [{'a':10, 'b':20},{'a':20, 'b':30,'c':40}]
table = pd.DataFrame(data_list, index = ['first','second'])
print(table)

         a   b     c
first   10  20   NaN
second  20  30  40.0


**Converting a dictionary of series into a Data Frame:**

In [21]:
d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df)

   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0


In [23]:
# Access Column 

df2 = pd.DataFrame(d, index=['d', 'b', 'a'])

print(df2)

   one  two
d  NaN  4.0
b  2.0  2.0
a  1.0  1.0


In [24]:
# Access index (Row) and column

df = pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])
print(df)

#Note: The row and column labels can be accessed respectively by accessing the index and columns attributes.

   two three
d  4.0   NaN
b  2.0   NaN
a  1.0   NaN


**Creating Data Frame From a list of dicts:**

In [26]:
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
table = pd.DataFrame(data2)
print(table)

   a   b     c
0  1   2   NaN
1  5  10  20.0


**Column selection, addition, deletion**


---

Add New Column in existing Data Frame

In [29]:
d = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
     'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
table = pd.DataFrame(d)

#Adding new column

table["three"] = pd.Series([1, 2, 3], index = ['a', 'b', 'c'])

print(table)

   one  two  three
a  1.0  1.0    1.0
b  2.0  2.0    2.0
c  3.0  3.0    3.0
d  NaN  4.0    NaN


In [30]:
#Adding new column with boolen values

table['flag'] = table["three"]>2

print(table)

   one  two  three   flag
a  1.0  1.0    1.0  False
b  2.0  2.0    2.0  False
c  3.0  3.0    3.0   True
d  NaN  4.0    NaN  False


**Columns can be deleted or popped like with a dict:**

---

Data Frame column can be deleted using  the **del () function:**

In [31]:
del table['two']
print(table)

   one  three   flag
a  1.0    1.0  False
b  2.0    2.0  False
c  3.0    3.0   True
d  NaN    NaN  False


Data Frame Column can be deleted using the **pop() function**:

---

**pop()** methond is an inbuilt function in Python that removes and returns last value from the list or the given index value.

In [32]:
three = table.pop('three')

print(three)

print(table)

a    1.0
b    2.0
c    3.0
d    NaN
Name: three, dtype: float64
   one   flag
a  1.0  False
b  2.0  False
c  3.0   True
d  NaN  False


When inserting **a scalar value**, it will naturally be propagated to fill the column:

---


In [33]:
table["foo"] = "bar"
print(table)

   one   flag  foo
a  1.0  False  bar
b  2.0  False  bar
c  3.0   True  bar
d  NaN  False  bar


**Indexing / Selection**

---

**The basics of indexing are as follows:**


---

**Operation	      -->    Syntax	       -->         Result**

---

Select column	    -->** df[col]**	--> Series

Select row by label	--> **df.loc[label]**	--> Series

Select row by integer location -->	**df.iloc[loc]**	--> Series

Slice rows	--> **df[5:10]	**--> DataFrame

Select rows by boolean vector	--> **df[bool_vec]**	--> DataFrame

---

Example : Row selection, for example, returns a Series whose index is the columns of the DataFrame:

In [35]:
table.loc["a"]

one         1
flag    False
foo       bar
Name: a, dtype: object

In [36]:
table.iloc[2]

one        3
flag    True
foo      bar
Name: c, dtype: object

# **Data Frame - Row Addtion:**

---

The **append() function **can be used to add one or more rows into the Data Frame


In [37]:
print(table)
print("----------------------------------------")
row = pd.DataFrame([[1,'True'],[3,'False']], columns = ['one','flag'])
table1= table.append(row)
print(table1)

   one   flag  foo
a  1.0  False  bar
b  2.0  False  bar
c  3.0   True  bar
d  NaN  False  bar
----------------------------------------
    flag  foo  one
a  False  bar  1.0
b  False  bar  2.0
c   True  bar  3.0
d  False  bar  NaN
0   True  NaN  1.0
1  False  NaN  3.0


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


**Data Frame - Row Deletion:**

---

The drop() function can be used to drop rows whose labels are provided


In [38]:
table1 = table1.drop('d')
print(table1)

    flag  foo  one
a  False  bar  1.0
b  False  bar  2.0
c   True  bar  3.0
0   True  NaN  1.0
1  False  NaN  3.0


**Data alignment and arithmetic:**

---

Data alignment between DataFrame objects automatically align on both the columns and the index (row labels). Again, the resulting object will have the union of the column and row labels.

In [39]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])

print(df + df2)

          A         B         C   D
0 -0.628508  1.725136  1.105064 NaN
1  1.302424 -1.212615 -2.771346 NaN
2 -0.319537  0.230366  0.203374 NaN
3 -1.371339 -0.303615  0.956477 NaN
4 -0.763576  1.085715  0.500240 NaN
5 -0.448892 -0.516872  0.918055 NaN
6  0.032101 -0.788190  0.504626 NaN
7       NaN       NaN       NaN NaN
8       NaN       NaN       NaN NaN
9       NaN       NaN       NaN NaN


**When doing an operation between DataFrame and Series, the default behavior is to align the Series index on the DataFrame columns, thus broadcasting row-wise.** 

---


For example:

In [40]:
print(df - df.iloc[0])

          A         B         C         D
0  0.000000  0.000000  0.000000  0.000000
1  0.824768 -2.460827 -2.676223  1.011748
2  0.349232  0.272975 -0.272810 -1.182524
3  0.547732 -0.575034  1.405389 -1.511950
4  1.119286 -0.832077  0.629325 -0.509567
5  0.499502 -0.751341  0.890200 -1.372174
6  0.535682 -1.225824 -1.347111  0.441199
7 -0.414519 -1.534890 -1.967859 -0.849740
8 -0.260617 -0.657236 -0.158754 -0.689994
9  0.491952  0.368275  1.144435 -1.025913
