In [1]:
import pandas as pd
import numpy as np

Have you ever needed to create a DataFrame of "dummy" data, but without reading from a file? 

How to create a DataFrame from a dictionary, a list, and a NumPy array. I'll also show you how to create a new Series and attach it to the DataFrame.

## Creating df using dictionary

In [2]:
# creating a df from a dictionary 
# but the columns orders cannot be same as desired coz in dictionary's are unordered data structure
pd.DataFrame({'id': [1,2,3], 'color': ['red', 'green', 'yellow']})

Unnamed: 0,id,color
0,1,red
1,2,green
2,3,yellow


In [4]:
# to fix the order of columns use columns argument to constructor
pd.DataFrame({'id': [1,2,3], 'color': ['red', 'green', 'yellow']}, columns = ['id', 'color'])

Unnamed: 0,id,color
0,1,red
1,2,green
2,3,yellow


In [6]:
# if we want to add our own index
df = pd.DataFrame({'id': [1,2,3], 'color': ['red', 'green', 'yellow']}, columns = ['id', 'color'], index = ['a', 'b', 'c'])
df

Unnamed: 0,id,color
a,1,red
b,2,green
c,3,yellow


## creating df using list of lists

In [7]:
pd.DataFrame([[1,'red'], [2,'green'], [3, 'yellow']])

Unnamed: 0,0,1
0,1,red
1,2,green
2,3,yellow


In [8]:
pd.DataFrame([[1,'red'], [2,'green'], [3, 'yellow']], columns = ['id', 'color'])

Unnamed: 0,id,color
0,1,red
1,2,green
2,3,yellow


## creating df using numpy arrays

In [9]:
arr = np.random.rand(4,2)   # creates a 4x2 numpy array
arr

array([[0.1333498 , 0.3147259 ],
       [0.56568392, 0.36360278],
       [0.24549963, 0.74162569],
       [0.0412701 , 0.80745788]])

In [10]:
pd.DataFrame(arr, columns = ['one', 'two'])

Unnamed: 0,one,two
0,0.13335,0.314726
1,0.565684,0.363603
2,0.2455,0.741626
3,0.04127,0.807458


In [12]:
pd.DataFrame({'student':np.arange(100, 110, 1), 'test':np.random.randint(60,101,10)})

Unnamed: 0,student,test
0,100,66
1,101,96
2,102,98
3,103,77
4,104,74
5,105,79
6,106,76
7,107,80
8,108,76
9,109,99


In [13]:
# we can chain to the above command a set_index function if we can use one of the columns  as index
pd.DataFrame({'student':np.arange(100, 110, 1), 'test':np.random.randint(60,101,10)}).set_index('student')

Unnamed: 0_level_0,test
student,Unnamed: 1_level_1
100,66
101,96
102,68
103,76
104,74
105,70
106,90
107,90
108,92
109,94


## How to create a series and attach it to an existing dataframe df

In [14]:
s = pd.Series(['round', 'square'], index = ['c', 'b'], name = 'shape')
s

c     round
b    square
Name: shape, dtype: object

In [15]:
df

Unnamed: 0,id,color
a,1,red
b,2,green
c,3,yellow


In [16]:
# to concatenate the series as column of dataframe
# observe name of series becomes column name of df
# Values of series gets aligned based on index
pd.concat([df, s], axis = 1)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


Unnamed: 0,id,color,shape
a,1,red,
b,2,green,square
c,3,yellow,round
