# Creating a Series
Creating a Series by passing a list of values, letting pandas create a default integer index:

In [2]:
import pandas as pd
import numpy as np

s = pd.Series([1, 3, 5, np.nan, 6, 7,8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    7.0
6    8.0
dtype: float64

Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns

In [12]:
dates = pd.date_range("20120601", periods=5)
dates

DatetimeIndex(['2012-06-01', '2012-06-02', '2012-06-03', '2012-06-04',
               '2012-06-05'],
              dtype='datetime64[ns]', freq='D')

In [14]:
df = pd.DataFrame(np.random.randn(5, 4), index=dates, columns=list("ABCD"))
df

Unnamed: 0,A,B,C,D
2012-06-01,-0.949676,1.709492,-1.2579,-0.593412
2012-06-02,-1.562115,-0.663672,-1.981759,-0.748058
2012-06-03,1.266012,0.823646,-0.460009,-1.356476
2012-06-04,0.112586,-0.287793,-1.61468,0.810569
2012-06-05,-0.652131,-0.191086,-0.289857,-0.424241


In [15]:
df.dtypes

A    float64
B    float64
C    float64
D    float64
dtype: object

Creating a DataFrame by passing a dictionary of objects that can be converted into a series-like structure

In [17]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20220601"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2022-06-01,1.0,3,test,foo
1,1.0,2022-06-01,1.0,3,train,foo
2,1.0,2022-06-01,1.0,3,test,foo
3,1.0,2022-06-01,1.0,3,train,foo


In [18]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

# Viewing data
Display the index, columns:

In [21]:
df2.index

Int64Index([0, 1, 2, 3], dtype='int64')

In [22]:
df2.columns

Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

NumPy representation of the underlying data

In [24]:
df2.to_numpy()

array([[1.0, Timestamp('2022-06-01 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2022-06-01 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2022-06-01 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2022-06-01 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)

To show a quick statistic summary of your data

In [26]:
df2.describe()

Unnamed: 0,A,C,D
count,4.0,4.0,4.0
mean,1.0,1.0,3.0
std,0.0,0.0,0.0
min,1.0,1.0,3.0
25%,1.0,1.0,3.0
50%,1.0,1.0,3.0
75%,1.0,1.0,3.0
max,1.0,1.0,3.0


Transposing your data

In [29]:
df.T

Unnamed: 0,2012-06-01,2012-06-02,2012-06-03,2012-06-04,2012-06-05
A,-0.949676,-1.562115,1.266012,0.112586,-0.652131
B,1.709492,-0.663672,0.823646,-0.287793,-0.191086
C,-1.2579,-1.981759,-0.460009,-1.61468,-0.289857
D,-0.593412,-0.748058,-1.356476,0.810569,-0.424241


Sorting by an axis

In [32]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2012-06-01,-0.593412,-1.2579,1.709492,-0.949676
2012-06-02,-0.748058,-1.981759,-0.663672,-1.562115
2012-06-03,-1.356476,-0.460009,0.823646,1.266012
2012-06-04,0.810569,-1.61468,-0.287793,0.112586
2012-06-05,-0.424241,-0.289857,-0.191086,-0.652131


Sorting by values

In [35]:
df.sort_values(by="D")

Unnamed: 0,A,B,C,D
2012-06-03,1.266012,0.823646,-0.460009,-1.356476
2012-06-02,-1.562115,-0.663672,-1.981759,-0.748058
2012-06-01,-0.949676,1.709492,-1.2579,-0.593412
2012-06-05,-0.652131,-0.191086,-0.289857,-0.424241
2012-06-04,0.112586,-0.287793,-1.61468,0.810569


# Selection
select a single column

In [41]:
df["B"]

2012-06-01    1.709492
2012-06-02   -0.663672
2012-06-03    0.823646
2012-06-04   -0.287793
2012-06-05   -0.191086
Freq: D, Name: B, dtype: float64

Selecting via [], which slices the rows

In [45]:
df[0:2]

Unnamed: 0,A,B,C,D
2012-06-01,-0.949676,1.709492,-1.2579,-0.593412
2012-06-02,-1.562115,-0.663672,-1.981759,-0.748058


In [51]:
df["20120602":"20120604"]

Unnamed: 0,A,B,C,D
2012-06-02,-1.562115,-0.663672,-1.981759,-0.748058
2012-06-03,1.266012,0.823646,-0.460009,-1.356476
2012-06-04,0.112586,-0.287793,-1.61468,0.810569


Selection by label

In [53]:
df.loc[dates[1]]

A   -1.562115
B   -0.663672
C   -1.981759
D   -0.748058
Name: 2012-06-02 00:00:00, dtype: float64

Selecting on a multi-axis by label

In [55]:
df.loc[:, ["C", "D"]]

Unnamed: 0,C,D
2012-06-01,-1.2579,-0.593412
2012-06-02,-1.981759,-0.748058
2012-06-03,-0.460009,-1.356476
2012-06-04,-1.61468,0.810569
2012-06-05,-0.289857,-0.424241


Showing label slicing, both endpoints are included

In [56]:
df.loc["20120602":"20120604", ["A", "B"]]

Unnamed: 0,A,B
2012-06-02,-1.562115,-0.663672
2012-06-03,1.266012,0.823646
2012-06-04,0.112586,-0.287793


Reduction in the dimensions of the returned object

In [58]:
df.loc["20120603", ["A", "B"]]

A    1.266012
B    0.823646
Name: 2012-06-03 00:00:00, dtype: float64

For getting a scalar value

In [62]:
df.loc[dates[3], "A"]

0.11258634517809715

For getting fast access to a scalar (equivalent to the prior method)

In [64]:
df.at[dates[3], "A"]

0.11258634517809715

Selection by position

In [65]:
df.iloc[3]

A    0.112586
B   -0.287793
C   -1.614680
D    0.810569
Name: 2012-06-04 00:00:00, dtype: float64

For slicing rows explicitly

In [67]:
df.iloc[1:4, :]

Unnamed: 0,A,B,C,D
2012-06-02,-1.562115,-0.663672,-1.981759,-0.748058
2012-06-03,1.266012,0.823646,-0.460009,-1.356476
2012-06-04,0.112586,-0.287793,-1.61468,0.810569


For slicing columns explicitly

In [75]:
df.iloc[:, 1:3]

Unnamed: 0,B,C
2012-06-01,1.709492,-1.2579
2012-06-02,-0.663672,-1.981759
2012-06-03,0.823646,-0.460009
2012-06-04,-0.287793,-1.61468
2012-06-05,-0.191086,-0.289857


For getting a value explicitly

In [78]:
df.iloc[2, 2]

-0.4600094259436381

For getting fast access to a scalar

In [80]:
df.iat[2, 2]

-0.4600094259436381