<a href="https://colab.research.google.com/github/machave11/Python---Data-Science/blob/main/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [35]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline
from skimage.io import imread

# Difference between ndarray and series object
# Difference between Numpy and Pandas

There are some differences worth noting between ndarrays and Series objects. First of all, elements in NumPy arrays are accessed by their integer position, starting with zero for the first element. A pandas Series Object is more flexible as you can use define your own labeled index to index and access elements of an array. You can also use letters instead of numbers, or number an array in descending order instead of ascending order. Second, aligning data from different Series and matching labels with Series objects is more efficient than using ndarrays, for example dealing with missing values. If there are no matching labels during alignment, pandas returns NaN (not any number) so that the operation does not fail

While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. NumPy, by contrast, is best suited for working with homogeneous numerical array data.

In [36]:
# Series
obj = pd.Series([1,-2,7,5,3])
obj

0    1
1   -2
2    7
3    5
4    3
dtype: int64

In [37]:
obj.values

array([ 1, -2,  7,  5,  3])

In [38]:
obj.index

RangeIndex(start=0, stop=5, step=1)

In [39]:
# Pandas object with custom index
obj2 = pd.Series([1,-2,7,5,3], ['a','b','c','d','e'])
obj2

a    1
b   -2
c    7
d    5
e    3
dtype: int64

In [40]:
obj2.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [41]:
obj2.values

array([ 1, -2,  7,  5,  3])

In [42]:
# Here ['c', 'a', 'd'] is interpreted as a list of indices, even though it contains
# strings instead of integers.
obj2[['c', 'a', 'd']]

c    7
a    1
d    5
dtype: int64

Another way to think about a Series is as a fixed-length, ordered dict, as it is a mapping of index values to data values. It can be used in many contexts where you might use a dict:

In [43]:
'f' in obj2

False

In [44]:
'c' in obj2

True

In [45]:
# Should you have data contained in a Python dict, you can create a Series from it by passing the dict:
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj4 = pd.Series(sdata)
obj4

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

When you are only passing a dict, the index in the resulting Series will have the dict’s keys in sorted order. You can override this by passing the dict keys in the order you want them to appear in the resulting Series:

In [46]:
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = pd.Series(sdata, index=states)
obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

In [47]:
# The isnull and notnull functions in pandas should be used to detect missing data:
pd.isnull(obj4)

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [48]:
pd.notnull(obj4)

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

In [49]:
obj2 + obj4

California   NaN
Ohio         NaN
Oregon       NaN
Texas        NaN
a            NaN
b            NaN
c            NaN
d            NaN
e            NaN
dtype: float64

A Series’s index can be altered in-place by assignment:

In [50]:
print(obj)

0    1
1   -2
2    7
3    5
4    3
dtype: int64


In [51]:
print(obj)
print()


0    1
1   -2
2    7
3    5
4    3
dtype: int64



# DataFrame

A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a dict of Series all sharing the same index. Under the hood, the data is stored as one or more two-dimensional blocks rather than a list, dict, or some other collection of one-dimensional arrays.

In [52]:
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
 'year': [2000, 2001, 2002, 2001, 2002, 2003],
 'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [53]:
## For large DataFrames, the head method selects only the first five rows:
frame.head()

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


In [54]:
## If you specify a sequence of columns, the DataFrame’s columns will be arranged in that order:
pd.DataFrame(data, columns=['year', 'state', 'pop'])

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9
5,2003,Nevada,3.2


In [55]:
## If you pass a column that isn’t contained in the dict, it will appear with missing values in the result:
frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'dept'], index=['one','two','three','four','five','six'])

#Filling the NAN columns (More on this later...)

In [56]:
#Columns can be modified by assignment. For example, the empty 'debt' column could be assigned a scalar value or an array of values:

In [57]:
from pandas.core.frame import DataFrame
frame2['debt'] = 16.5
frame2

Unnamed: 0,year,state,pop,dept,debt
one,2000,Ohio,1.5,,16.5
two,2001,Ohio,1.7,,16.5
three,2002,Ohio,3.6,,16.5
four,2001,Nevada,2.4,,16.5
five,2002,Nevada,2.9,,16.5
six,2003,Nevada,3.2,,16.5


In [58]:
frame2['dept'] = np.arange(6.)
frame2

Unnamed: 0,year,state,pop,dept,debt
one,2000,Ohio,1.5,0.0,16.5
two,2001,Ohio,1.7,1.0,16.5
three,2002,Ohio,3.6,2.0,16.5
four,2001,Nevada,2.4,3.0,16.5
five,2002,Nevada,2.9,4.0,16.5
six,2003,Nevada,3.2,5.0,16.5


In [59]:
#When you are assigning lists or arrays to a column, the value’s length must match the length of the DataFrame. 
#If you assign a Series, its labels will be realigned exactly to the DataFrame’s index, inserting missing values in any holes:
val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
frame2['debt'] = val
frame2

Unnamed: 0,year,state,pop,dept,debt
one,2000,Ohio,1.5,0.0,
two,2001,Ohio,1.7,1.0,-1.2
three,2002,Ohio,3.6,2.0,
four,2001,Nevada,2.4,3.0,-1.5
five,2002,Nevada,2.9,4.0,-1.7
six,2003,Nevada,3.2,5.0,


In [60]:
#Assigning a column that doesn’t exist will create a new column. 
#The del keyword will delete columns as with a dict. As an example of del, 
#I first add a new column of boolean values where the state column equals 'Ohio':
frame2['eastern'] = frame2.state == 'ohio'
frame2

Unnamed: 0,year,state,pop,dept,debt,eastern
one,2000,Ohio,1.5,0.0,,False
two,2001,Ohio,1.7,1.0,-1.2,False
three,2002,Ohio,3.6,2.0,,False
four,2001,Nevada,2.4,3.0,-1.5,False
five,2002,Nevada,2.9,4.0,-1.7,False
six,2003,Nevada,3.2,5.0,,False



Note: New columns cannot be created with the frame2.eastern syntax

In [61]:
# The del method can then be used to remove this column:
del frame2['eastern']
frame2

Unnamed: 0,year,state,pop,dept,debt
one,2000,Ohio,1.5,0.0,
two,2001,Ohio,1.7,1.0,-1.2
three,2002,Ohio,3.6,2.0,
four,2001,Nevada,2.4,3.0,-1.5
five,2002,Nevada,2.9,4.0,-1.7
six,2003,Nevada,3.2,5.0,


Another common form of data is a nested dict of dicts. If the nested dict is passed to the DataFrame, pandas will interpret the outer dict keys as the columns and the inner keys as the row indices

In [62]:
pop = {'Nevada': {2001: 2.4, 2002: 2.9}, 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = pd.DataFrame(pop)
frame3

Unnamed: 0,Nevada,Ohio
2001,2.4,1.7
2002,2.9,3.6
2000,,1.5


In [63]:
frame3.T

Unnamed: 0,2001,2002,2000
Nevada,2.4,2.9,
Ohio,1.7,3.6,1.5


In [64]:
pd.DataFrame(pop, index = [2001, 2002, 2003])

Unnamed: 0,Nevada,Ohio
2001,2.4,1.7
2002,2.9,3.6
2003,,


# Index Objects

pandas’s Index objects are responsible for holding the axis labels and other metadata (like the axis name or names). Any array or other sequence of labels you use when constructing a Series or DataFrame is internally converted to an Index:

In [65]:
obj = pd.Series(range(3), index=['a', 'b', 'c'])
index = obj.index
index

Index(['a', 'b', 'c'], dtype='object')

In [66]:
index[1:]

Index(['b', 'c'], dtype='object')

# index object are immutable and cant be modified by users

In [67]:
labels = pd.Index(np.arange(3))
labels

Int64Index([0, 1, 2], dtype='int64')

In [68]:
obj2 = pd.Series([1.5, -2.5, 0], index=labels)
obj2

0    1.5
1   -2.5
2    0.0
dtype: float64

In [69]:
obj2.index is labels

True

NOTE: Unlike Python sets, a pandas Index can contain duplicate labels:

In [70]:
dup_labels = pd.Index(['foo', 'foo', 'bar', 'bar'])
dup_labels

Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

#Reindexing
An important method on pandas objects is reindex, which means to create a new object with the data conformed to a new index.

In [71]:
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj

d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

In [72]:
obj2 = obj.reindex(['a','b','c','d','e'])
obj2

a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

For ordered data like time series, it may be desirable to do some interpolation or fill‐ ing of values when reindexing. The method option allows us to do this, using a method such as ffill, which forward-fills the values:

In [73]:
obj3 = pd.Series(['blue','yellow','red'], ['one','two','three'])
obj3

one        blue
two      yellow
three       red
dtype: object

In [74]:
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
 index=['a', 'c', 'd'],
 columns=['Ohio', 'Texas', 'California'])

frame

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


In [75]:
frame2 = frame.reindex(['a', 'b', 'c', 'd'])
frame2

Unnamed: 0,Ohio,Texas,California
a,0.0,1.0,2.0
b,,,
c,3.0,4.0,5.0
d,6.0,7.0,8.0


In [76]:
states = ['Texas', 'Utah', 'California']
frame.reindex(columns=states)

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


# Dropping Entries from an Axis

In [77]:
#the drop method will return a new object with the indicated value or values deleted from an axis:
obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
obj

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [78]:
obj.drop('c')

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

In [79]:
obj.drop(['d', 'c'])

a    0.0
b    1.0
e    4.0
dtype: float64

With DataFrame, index values can be deleted from either axis. To illustrate this, we first create an example DataFrame:

In [80]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],
 columns=['one', 'two', 'three', 'four'])

data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [81]:
data['two']

Ohio         1
Colorado     5
Utah         9
New York    13
Name: two, dtype: int64

In [82]:
data[['three', 'one']]

Unnamed: 0,three,one
Ohio,2,0
Colorado,6,4
Utah,10,8
New York,14,12


In [None]:
#Indexing like this has a few special cases. First, slicing or selecting data with a boolean array:

In [83]:
data[:2]

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7


In [84]:
data[data['three'] > 5]

Unnamed: 0,one,two,three,four
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


Another use case is in indexing with a boolean DataFrame, such as one produced by a scalar comparison:

In [85]:
data < 5

Unnamed: 0,one,two,three,four
Ohio,True,True,True,True
Colorado,True,False,False,False
Utah,False,False,False,False
New York,False,False,False,False


In [86]:
data[data<5]=0
data

Unnamed: 0,one,two,three,four
Ohio,0,0,0,0
Colorado,0,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


This makes DataFrame syntactically more like a two-dimensional NumPy array in this particular case.

# Selection with loc and iloc

For DataFrame label-indexing on the rows, I introduce the special indexing operators loc and iloc. They enable you to select a subset of the rows and columns from a DataFrame with NumPy-like notation using either axis labels (loc) or integers (iloc). As a preliminary example, let’s select a single row and multiple columns by label:

In [87]:
data.loc['Colorado', ['two', 'three']]

two      5
three    6
Name: Colorado, dtype: int64

In [88]:
data.iloc[2, [3, 0, 1]]

four    11
one      8
two      9
Name: Utah, dtype: int64

In [91]:
data.iloc[2]

one       8
two       9
three    10
four     11
Name: Utah, dtype: int64

In [94]:
data.iloc[[1, 2], [3, 0, 1]]

Unnamed: 0,four,one,two
Colorado,7,0,5
Utah,11,8,9


#Both indexing functions work with slices in addition to single labels or lists of labels:

In [95]:
data.loc[:'Utah', 'two']

Ohio        0
Colorado    5
Utah        9
Name: two, dtype: int64

In [96]:
data.iloc[:, :3][data.three > 5]

Unnamed: 0,one,two,three
Colorado,0,5,6
Utah,8,9,10
New York,12,13,14


# Integer Indexes

In [98]:
ser =pd.Series(np.arange(6))
ser

0    0
1    1
2    2
3    3
4    4
5    5
dtype: int64

In [100]:
ser[:1]

0    0
dtype: int64

In [101]:
ser.loc[:1]

0    0
1    1
dtype: int64

In [103]:
ser.iloc[:1]

0    0
dtype: int64

# Arithmetic and Data Alignment

An important pandas feature for some applications is the behavior of arithmetic between objects with different indexes. When you are adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs. For users with database experience, this is similar to an automatic outer join on the index labels.

In [105]:
ser1 = pd.Series([1.2, 3.4, 5.6, 7.7], ['a', 'b', 'c', 'd'])
ser1

a    1.2
b    3.4
c    5.6
d    7.7
dtype: float64

In [106]:
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], ['a', 'c', 'e', 'f', 'g'])
s2

a   -2.1
c    3.6
e   -1.5
f    4.0
g    3.1
dtype: float64

In [108]:
ser1 + s2

a   -0.9
b    NaN
c    9.2
d    NaN
e    NaN
f    NaN
g    NaN
dtype: float64

The internal data alignment introduces missing values in the label locations that don’t overlap. Missing values will then propagate in further arithmetic computations.

In the case of DataFrame, alignment is performed on both the rows and the columns:

In [110]:
df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),
  index=['Ohio', 'Texas', 'Colorado'])
df1

Unnamed: 0,b,c,d
Ohio,0.0,1.0,2.0
Texas,3.0,4.0,5.0
Colorado,6.0,7.0,8.0


In [118]:
df2 = pd.DataFrame(np.arange(12).reshape((4,3)), columns=list('bde'),
index = ['uthah', 'ohio', 'colorado','oregon'])
df2

Unnamed: 0,b,d,e
uthah,0,1,2
ohio,3,4,5
colorado,6,7,8
oregon,9,10,11


In [119]:
df1 + df2

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,,,,
Texas,,,,
colorado,,,,
ohio,,,,
oregon,,,,
uthah,,,,


Since the 'c' and 'e' columns are not found in both DataFrame objects, they appear as all missing in the result. The same holds for the rows whose labels are not common to both objects.

In [None]:
df1 = pd.DataFrame(np.arange(12.).reshape((3, 4)),
 .....: columns=list('abcd'))
df1

In [124]:
df2 = pd.DataFrame(np.arange(20.).reshape((4,5)), columns = list('abcde'))
df2

Unnamed: 0,a,b,c,d,e
0,0.0,1.0,2.0,3.0,4.0
1,5.0,6.0,7.0,8.0,9.0
2,10.0,11.0,12.0,13.0,14.0
3,15.0,16.0,17.0,18.0,19.0


In [125]:
df2.loc[1, 'c'] = np.nan
df2

Unnamed: 0,a,b,c,d,e
0,0.0,1.0,2.0,3.0,4.0
1,5.0,6.0,,8.0,9.0
2,10.0,11.0,12.0,13.0,14.0
3,15.0,16.0,17.0,18.0,19.0


In [126]:
df1 +df2

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,
1,9.0,11.0,,15.0,
2,18.0,20.0,22.0,24.0,
3,,,,,


In [128]:
df1.add(df2, fill_value = 0)

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,4.0
1,9.0,11.0,6.0,15.0,9.0
2,18.0,20.0,22.0,24.0,14.0
3,15.0,16.0,17.0,18.0,19.0


In [130]:
1/df1

Unnamed: 0,a,b,c,d
0,inf,1.0,0.5,0.333333
1,0.25,0.2,0.166667,0.142857
2,0.125,0.111111,0.1,0.090909


# Function Application and Mapping

NumPy ufuncs (element-wise array methods) also work with pandas objects:

In [132]:
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
 index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

Unnamed: 0,b,d,e
Utah,0.222769,-1.679763,0.834016
Ohio,0.012729,0.236103,0.488319
Texas,-0.077997,1.469313,0.588606
Oregon,-0.735376,1.69797,-0.690598


In [133]:
np.abs(frame)

Unnamed: 0,b,d,e
Utah,0.222769,1.679763,0.834016
Ohio,0.012729,0.236103,0.488319
Texas,0.077997,1.469313,0.588606
Oregon,0.735376,1.69797,0.690598


#Sorting and Ranking

In [135]:
obj = pd.Series(range(4), index=['a','b','c','d'])
obj

a    0
b    1
c    2
d    3
dtype: int64

In [138]:
obj.sort_index()

a    0
b    1
c    2
d    3
dtype: int64

With a DataFrame, you can sort by index on either axis:

In [141]:

frame = pd.DataFrame(np.arange(8).reshape((2, 4)), index=['three', 'one'], columns=['d', 'a', 'b', 'c'])
frame

Unnamed: 0,d,a,b,c
three,0,1,2,3
one,4,5,6,7


In [143]:
frame.sort_index()

Unnamed: 0,d,a,b,c
one,4,5,6,7
three,0,1,2,3


In [None]:
frame.sort_index(axis = 1)

he data is sorted in ascending order by default, but can be sorted in descending order, too:

In [148]:
frame.sort_index(axis=1, ascending=False)

Unnamed: 0,d,c,b,a
three,0,3,2,1
one,4,7,6,5


To sort a Series by its values, use its sort_values method

In [152]:
obj = pd.Series([5,4.3,2,5,6])
obj

0    5.0
1    4.3
2    2.0
3    5.0
4    6.0
dtype: float64

In [153]:
obj.sort_values()

2    2.0
1    4.3
0    5.0
3    5.0
4    6.0
dtype: float64

Any missing values are sorted to the end of the Series by default:

In [154]:
obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])
obj

0    4.0
1    NaN
2    7.0
3    NaN
4   -3.0
5    2.0
dtype: float64

In [155]:
obj.sort_values()

4   -3.0
5    2.0
0    4.0
2    7.0
1    NaN
3    NaN
dtype: float64

When sorting a DataFrame, you can use the data in one or more columns as the sort keys. To do so, pass one or more column names to the by option of sort_values:

In [156]:
frame = pd.DataFrame({'b': [4, 7, -3, 2], 'a': [0, 1, 0, 1]})
frame

Unnamed: 0,b,a
0,4,0
1,7,1
2,-3,0
3,2,1


In [158]:
frame.sort_values(by ='b')

Unnamed: 0,b,a
2,-3,0
3,2,1
0,4,0
1,7,1


In [159]:
frame.sort_values(by=['a', 'b'])

Unnamed: 0,b,a
2,-3,0
0,4,0
3,2,1
1,7,1


Ranking assigns ranks from one through the number of valid data points in an array. The rank methods for Series and DataFrame are the place to look; by default rank breaks ties by assigning each group the mean rank:

In [161]:
obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
obj

0    7
1   -5
2    7
3    4
4    2
5    0
6    4
dtype: int64

In [162]:
obj.rank()

0    6.5
1    1.0
2    6.5
3    4.5
4    3.0
5    2.0
6    4.5
dtype: float64

In [163]:
obj.rank(method='first')

0    6.0
1    1.0
2    7.0
3    4.0
4    3.0
5    2.0
6    5.0
dtype: float64

Here, instead of using the average rank 6.5 for the entries 0 and 2, they instead have been set to 6 and 7 because label 0 precedes label 2 in the data.

In [165]:
obj.rank(ascending=True, method='max')

0    7.0
1    1.0
2    7.0
3    5.0
4    3.0
5    2.0
6    5.0
dtype: float64

In [167]:
frame = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1], 'c': [-2, 5, 8, -2.5]})
frame.rank(axis='columns')

Unnamed: 0,b,a,c
0,3.0,2.0,1.0
1,3.0,1.0,2.0
2,1.0,2.0,3.0
3,3.0,2.0,1.0


#Summarizing and Computing Descriptive Statistics

pandas objects are equipped with a set of common mathematical and statistical meth‐ ods. Most of these fall into the category of reductions or summary statistics, methods that extract a single value (like the sum or mean) from a Series or a Series of values from the rows or columns of a DataFrame. Compared with the similar methods found on NumPy arrays, they have built-in handling for missing data.

In [168]:
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]],
index=['a', 'b', 'c', 'd'], columns=['one', 'two'])
df

Unnamed: 0,one,two
a,1.4,
b,7.1,-4.5
c,,
d,0.75,-1.3


In [169]:
df.sum()

one    9.25
two   -5.80
dtype: float64

In [170]:
df.sum(axis = 1)

a    1.40
b    2.60
c    0.00
d   -0.55
dtype: float64

In [174]:
df.mean(axis='columns', skipna=False)

a      NaN
b    1.300
c      NaN
d   -0.275
dtype: float64

In [171]:
df.describe()

Unnamed: 0,one,two
count,3.0,2.0
mean,3.083333,-2.9
std,3.493685,2.262742
min,0.75,-4.5
25%,1.075,-3.7
50%,1.4,-2.9
75%,4.25,-2.1
max,7.1,-1.3


In [175]:
obj = pd.Series(['a', 'a', 'b', 'c'] * 4)
obj

0     a
1     a
2     b
3     c
4     a
5     a
6     b
7     c
8     a
9     a
10    b
11    c
12    a
13    a
14    b
15    c
dtype: object

In [176]:
obj.describe()

count     16
unique     3
top        a
freq       8
dtype: object