## Object Creation

In [1]:
import numpy as np
import pandas as pd

Creating a series by passing a list of values, letting pandas create a default interger index:

np.nan means Not A Number

In [2]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])

s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Create a **DataFrame** by passing a NumPy array, with a datetime index and labeled columns:
    

In [5]:
dates = pd.date_range('20190101', periods = 6)
dates

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06'],
              dtype='datetime64[ns]', freq='D')

In [6]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2019-01-01,-0.236863,0.535296,-0.425781,0.249483
2019-01-02,1.67476,0.331249,0.931249,1.09336
2019-01-03,0.418508,0.941258,1.257385,-1.018234
2019-01-04,1.392768,-0.540825,0.471702,-0.206771
2019-01-05,-1.441373,-2.476642,0.50265,-0.562518
2019-01-06,0.26234,1.184995,0.84721,1.19041


***

Creating a **DataFrame** by passing a dict of objects that can be converted to series like.

In [13]:
df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})


In [14]:
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


The columns of the resulting **DataFrame** have different dtypes.

In [16]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

****
If you're using IPytion, tab completion for column names (as well as public attributes) is automatically enabled.  
Here's a subset of the arrributes that will be completed:

In [24]:
df2.<TAB>  # noqa: E225, E999

SyntaxError: invalid syntax (<ipython-input-24-2e62111f8e28>, line 1)

### Viewing Data
Here is how to view the top and bottom rows of the frame:

In [27]:
df.head() # 5 rows by default

Unnamed: 0,A,B,C,D
2019-01-01,-0.236863,0.535296,-0.425781,0.249483
2019-01-02,1.67476,0.331249,0.931249,1.09336
2019-01-03,0.418508,0.941258,1.257385,-1.018234
2019-01-04,1.392768,-0.540825,0.471702,-0.206771
2019-01-05,-1.441373,-2.476642,0.50265,-0.562518


In [28]:
df.tail(3) # 3 rows specified

Unnamed: 0,A,B,C,D
2019-01-04,1.392768,-0.540825,0.471702,-0.206771
2019-01-05,-1.441373,-2.476642,0.50265,-0.562518
2019-01-06,0.26234,1.184995,0.84721,1.19041


Display the index, columns.

In [30]:
df.index

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06'],
              dtype='datetime64[ns]', freq='D')

In [31]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

**DataFrame.to_numpy()** gives a NumPy presentation fo the underlying data. Note that this can be an expensive operation when your **DataFrame** has columns with different data types, which comes down to a fundamental difference between pandas and NumPy: NumPy arrasys have one dtype for the entire array, while pandas DataFrames have one dtype per column/ When you call **DataFrame.to_numpy()**, pandas will find the NumPy dtype that can hold all of the dtypes in the DataFrame. This may end up being object, which requires casting every value to a Python object.