**From dict of Series or dicts**

In [1]:
import numpy as np
import pandas as pd

In [2]:
d = {'one': pd.Series([2., 3., 4.], index=['p', 'q', 'r']),
     'two': pd.Series([2., 3., 4., 5.], index=['p', 'q', 'r', 's'])}

In [3]:
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
p,2.0,2.0
q,3.0,3.0
r,4.0,4.0
s,,5.0


In [4]:
pd.DataFrame(d, index=['s', 'q', 'p'])

Unnamed: 0,one,two
s,,5.0
q,3.0,3.0
p,2.0,2.0


In [5]:
pd.DataFrame(d, index=['s', 'q', 'p'], columns=['two', 'three'])

Unnamed: 0,two,three
s,5.0,
q,3.0,
p,2.0,


The row and column labels can be accessed respectively by accessing the index and columns attributes:

In [6]:
df.index

Index(['p', 'q', 'r', 's'], dtype='object')

In [7]:
df.columns

Index(['one', 'two'], dtype='object')

**From dict of ndarrays / lists**

The ndarrays must all be the same length.<br> 
If an index is passed, it must clearly also be the same length as the arrays.<br>
If no index is passed, the result will be range(n), where n is the array length.

In [8]:
d = {'one': [4., 5., 6., 7.],
     'two': [7., 6., 5., 4.]}

In [9]:
pd.DataFrame(d)

Unnamed: 0,one,two
0,4.0,7.0
1,5.0,6.0
2,6.0,5.0
3,7.0,4.0


In [10]:
pd.DataFrame(d, index=['w', 'x', 'y', 'z'])

Unnamed: 0,one,two
w,4.0,7.0
x,5.0,6.0
y,6.0,5.0
z,7.0,4.0


**From structured or record array:**

In [11]:
data = np.zeros((2, ), dtype=[('P', 'i4'), ('Q', 'f4'), ('R', 'a10')])

In [12]:
data[:] = [(2, 3., 'Best'), (3, 4., "Friend")]

In [13]:
pd.DataFrame(data)

Unnamed: 0,P,Q,R
0,2,3.0,b'Best'
1,3,4.0,b'Friend'


In [14]:
pd.DataFrame(data, index=['first', 'second'])

Unnamed: 0,P,Q,R
first,2,3.0,b'Best'
second,3,4.0,b'Friend'


In [15]:
pd.DataFrame(data, columns=['R', 'P', 'Q'])

Unnamed: 0,R,P,Q
0,b'Best',2,3.0
1,b'Friend',3,4.0


**From a list of dicts**

In [16]:
data2 = [{'p': 2, 'q': 4}, {'p': 5, 'q': 10, 'r': 15}]

In [17]:
pd.DataFrame(data2)

Unnamed: 0,p,q,r
0,2,4,
1,5,10,15.0


In [18]:
pd.DataFrame(data2, index=['first', 'second'])

Unnamed: 0,p,q,r
first,2,4,
second,5,10,15.0


In [19]:
pd.DataFrame(data2, columns=['p', 'q'])

Unnamed: 0,p,q
0,2,4
1,5,10


**From a dict of tuples**<br>
You can automatically create a MultiIndexed frame by passing a tuples dictionary.

In [20]:
pd.DataFrame({('p', 'q'): {('P', 'Q'): 2, ('P', 'R'): 1},
               ('p', 'p'): {('P', 'R'): 4, ('P', 'Q'): 3},
               ('p', 'r'): {('P', 'Q'): 6, ('P', 'R'): 5},
               ('q', 'p'): {('P', 'R'): 8, ('P', 'Q'): 7},
               ('q', 'q'): {('P', 'S'): 10, ('P', 'Q'): 9}})

Unnamed: 0_level_0,Unnamed: 1_level_0,p,p,p,q,q
Unnamed: 0_level_1,Unnamed: 1_level_1,q,p,r,p,q
P,Q,2.0,3.0,6.0,7.0,9.0
P,R,1.0,4.0,5.0,8.0,
P,S,,,,,10.0


**Missing data**

To construct a DataFrame with missing data, we use np.nan to represent missing values.<br>
Alternatively, you may pass a numpy.MaskedArray as the data argument to the DataFrame<br>
constructor, and its masked entries will be considered missing.

In [21]:
df = pd.DataFrame(np.random.randn(4, 3), index=['a', 'b', 'c', 'd'],
                  columns=['one', 'two', 'three'])

In [22]:
df['four'] = 'bar'

In [23]:
df['five'] = df['one'] > 0
df

Unnamed: 0,one,two,three,four,five
a,0.374851,1.575271,-2.633,bar,True
b,-0.722451,-0.446845,0.933967,bar,False
c,-1.386522,1.575828,2.371845,bar,False
d,1.422895,-1.145874,1.347855,bar,True


In [24]:
df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
df2

Unnamed: 0,one,two,three,four,five
a,0.374851,1.575271,-2.633,bar,True
b,-0.722451,-0.446845,0.933967,bar,False
c,-1.386522,1.575828,2.371845,bar,False
d,1.422895,-1.145874,1.347855,bar,True
e,,,,,
f,,,,,
g,,,,,


**Alternate constructors**<br>
**DataFrame.from_dict**<br>
DataFrame.from_dict takes a dict of dicts or a dict of array-like sequences and returns a DataFrame.<br>
It operates like the DataFrame constructor except for the orient parameter which is 'columns' by default,<br>
but which can be set to 'index' in order to use the dict keys as row labels.

In [25]:
pd.DataFrame.from_dict(dict([('P', [2, 3, 4]), ('Q', [5, 6, 7])]))

Unnamed: 0,P,Q
0,2,5
1,3,6
2,4,7


If you pass orient='index', the keys will be the row labels. In this case, you can also pass the desired<br>
column names:

In [26]:
pd.DataFrame.from_dict(dict([('P', [2, 3, 4]), ('Q', [5, 6, 7])]),
                       orient='index', columns=['one', 'two', 'three'])

Unnamed: 0,one,two,three
P,2,3,4
Q,5,6,7


**DataFrame.from_records**<br>
DataFrame.from_records takes a list of tuples or an ndarray with structured dtype. 

In [27]:
data

array([(2, 3., b'Best'), (3, 4., b'Friend')],
      dtype=[('P', '<i4'), ('Q', '<f4'), ('R', 'S10')])

In [28]:
pd.DataFrame.from_records(data, index='R')

Unnamed: 0_level_0,P,Q
R,Unnamed: 1_level_1,Unnamed: 2_level_1
b'Best',2,3.0
b'Friend',3,4.0


**Column selection, addition, deletion**

In [29]:
df['one']

a    0.374851
b   -0.722451
c   -1.386522
d    1.422895
Name: one, dtype: float64

In [30]:
df['three'] = df['one'] * df['two']

In [31]:
df['flag'] = df['one'] > 2
df

Unnamed: 0,one,two,three,four,five,flag
a,0.374851,1.575271,0.590491,bar,True,False
b,-0.722451,-0.446845,0.322824,bar,False,False
c,-1.386522,1.575828,-2.18492,bar,False,False
d,1.422895,-1.145874,-1.630458,bar,True,False


Columns can be deleted or popped like with a dict:

In [32]:
del df['two']

In [33]:
three = df.pop('three')

In [34]:
df

Unnamed: 0,one,four,five,flag
a,0.374851,bar,True,False
b,-0.722451,bar,False,False
c,-1.386522,bar,False,False
d,1.422895,bar,True,False


When inserting a scalar value, it will naturally be propagated to fill the column:

In [35]:
df['foo'] = 'bar'

In [36]:
df

Unnamed: 0,one,four,five,flag,foo
a,0.374851,bar,True,False,bar
b,-0.722451,bar,False,False,bar
c,-1.386522,bar,False,False,bar
d,1.422895,bar,True,False,bar


When inserting a Series that does not have the same index as the DataFrame, it will be conformed to the<br>
DataFrame’s index:

In [37]:
df['one_trunc'] = df['one'][:2]

In [38]:
df

Unnamed: 0,one,four,five,flag,foo,one_trunc
a,0.374851,bar,True,False,bar,0.374851
b,-0.722451,bar,False,False,bar,-0.722451
c,-1.386522,bar,False,False,bar,
d,1.422895,bar,True,False,bar,


You can insert raw ndarrays but their length must match the length of the DataFrame’s index.

By default, columns get inserted at the end. The insert function is available to insert at a particular<br>
location in the columns:

In [39]:
df.insert(1, 'bar', df['one'])

In [40]:
df

Unnamed: 0,one,bar,four,five,flag,foo,one_trunc
a,0.374851,0.374851,bar,True,False,bar,0.374851
b,-0.722451,-0.722451,bar,False,False,bar,-0.722451
c,-1.386522,-1.386522,bar,False,False,bar,
d,1.422895,1.422895,bar,True,False,bar,


**Assigning new columns in method chains**

In [41]:
iris = pd.read_csv('https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f487608c537c05e22e4b221/iris.csv')

In [42]:
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [43]:
(iris.assign(sepal_ratio=iris['sepal_width'] / iris['sepal_length'])
      .head())

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,sepal_ratio
0,5.1,3.5,1.4,0.2,setosa,0.686275
1,4.9,3.0,1.4,0.2,setosa,0.612245
2,4.7,3.2,1.3,0.2,setosa,0.680851
3,4.6,3.1,1.5,0.2,setosa,0.673913
4,5.0,3.6,1.4,0.2,setosa,0.72


In the example above, we inserted a precomputed value. We can also pass in a function of one argument to be<br>
evaluated on the DataFrame being assigned to.

In [44]:
iris.assign(sepal_ratio=lambda x: (x['sepal_width'] / x['sepal_length'])).head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,sepal_ratio
0,5.1,3.5,1.4,0.2,setosa,0.686275
1,4.9,3.0,1.4,0.2,setosa,0.612245
2,4.7,3.2,1.3,0.2,setosa,0.680851
3,4.6,3.1,1.5,0.2,setosa,0.673913
4,5.0,3.6,1.4,0.2,setosa,0.72


**assign always returns a copy of the data, leaving the original DataFrame untouched.**

In [45]:
(iris.query('sepal_length > 4')
      .assign(sepal_ratio=lambda x: x.sepal_width / x.sepal_length,
            petal_ratio=lambda x: x.petal_width / x.petal_length)
      .plot(kind='scatter', x='sepal_ratio', y='petal_ratio'))

<matplotlib.axes._subplots.AxesSubplot at 0x92360f0>

**Indexing / selection**<br>
The basics of indexing are as follows:

 Operation                                  Syntax                                 Result
Select column                               df[col]                                Select column    
Select row by label                         df.loc[label]                          Series
Select row by integer location              df.iloc[loc]                           Series
Slice rows                                  df[5:10]                               DataFrame
Select rows by boolean vector               df[bool_vec]                           DataFrame

Row selection, for example, returns a Series whose index is the columns of the DataFrame:

In [47]:
import numpy as np
import pandas as pd

In [48]:
d = {'one': pd.Series([2., 3., 4.], index=['p', 'q', 'r']),
     'two': pd.Series([2., 3., 4., 5.], index=['p', 'q', 'r', 's'])}

In [49]:
df = pd.DataFrame(d)

In [50]:
df.loc['q']

one    3.0
two    3.0
Name: q, dtype: float64

For a more exhaustive treatment of sophisticated label-based indexing and slicing, see the section<br>
on indexing. We will address the fundamentals of reindexing / conforming to new sets of labels in the<br>
section on reindexing.

**Data alignment and arithmetic**<br>
Data alignment between DataFrame objects automatically align on both the columns and the index<br>
(row labels). Again, the resulting object will have the union of the column and row labels.

In [51]:
import numpy as np
import pandas as pd

In [52]:
df = pd.DataFrame(np.random.randn(8, 4), columns=['P', 'Q', 'R', 'S'])

In [53]:
df2 = pd.DataFrame(np.random.randn(9, 3), columns=['P', 'Q', 'R'])

In [54]:
df + df2

Unnamed: 0,P,Q,R,S
0,-1.359783,-0.362051,2.13886,
1,1.336226,-0.040574,-1.115697,
2,0.607396,0.102404,-0.569467,
3,0.447739,0.898215,-0.298585,
4,-0.080552,0.106972,-1.896163,
5,0.279951,3.03925,0.895204,
6,2.216397,0.493784,1.881839,
7,0.234132,-1.667978,1.39171,
8,,,,


When doing an operation between DataFrame and Series, the default behavior is to align the Series index<br>
on the DataFrame columns, thus broadcasting row-wise. For example:

In [55]:
 df - df.iloc[0]

Unnamed: 0,P,Q,R,S
0,0.0,0.0,0.0,0.0
1,2.341762,-0.437433,-1.77401,-1.113588
2,0.865723,0.373685,-1.407959,-3.147135
3,2.369243,0.895406,-1.702355,-1.851926
4,1.95828,1.686684,-2.91375,-3.017007
5,2.2283,1.197087,0.170023,-2.952011
6,2.629536,0.35969,-1.355995,-2.31686
7,-0.537236,0.538296,-0.258632,-2.835196


In the special case of working with time series data, if the DataFrame index contains dates,<br>
the broadcasting will be column-wise:

In [56]:
index = pd.date_range('1/1/2019', periods=6)

In [57]:
df = pd.DataFrame(np.random.randn(6, 3), index=index, columns=list('XYZ'))

In [58]:
df

Unnamed: 0,X,Y,Z
2019-01-01,0.723344,-0.086666,0.345296
2019-01-02,0.447144,-0.016384,-0.256257
2019-01-03,0.100551,-1.133672,0.038232
2019-01-04,-1.461892,-0.21772,-0.919715
2019-01-05,-0.093798,-0.058988,-0.569805
2019-01-06,-0.594957,0.881995,0.161602


In [59]:
type(df['X'])

pandas.core.series.Series

In [60]:
df - df['X']

Unnamed: 0,2019-01-01 00:00:00,2019-01-02 00:00:00,2019-01-03 00:00:00,2019-01-04 00:00:00,2019-01-05 00:00:00,2019-01-06 00:00:00,X,Y,Z
2019-01-01,,,,,,,,,
2019-01-02,,,,,,,,,
2019-01-03,,,,,,,,,
2019-01-04,,,,,,,,,
2019-01-05,,,,,,,,,
2019-01-06,,,,,,,,,


For explicit control over the matching and broadcasting behavior.

Operations with scalars are just as you would expect:

In [61]:
df * 4 + 2

Unnamed: 0,X,Y,Z
2019-01-01,4.893376,1.653334,3.381182
2019-01-02,3.788575,1.934465,0.974971
2019-01-03,2.402204,-2.534689,2.152928
2019-01-04,-3.847566,1.129118,-1.678859
2019-01-05,1.624809,1.764047,-0.279221
2019-01-06,-0.379829,5.527982,2.646408


In [62]:
1 / df

Unnamed: 0,X,Y,Z
2019-01-01,1.382468,-11.538486,2.89607
2019-01-02,2.236417,-61.035978,-3.90233
2019-01-03,9.945196,-0.882089,26.156106
2019-01-04,-0.684045,-4.593046,-1.087294
2019-01-05,-10.661233,-16.952541,-1.754986
2019-01-06,-1.680793,1.133793,6.188043


In [63]:
df ** 6

Unnamed: 0,X,Y,Z
2019-01-01,0.1432417,4.237475e-07,0.001694907
2019-01-02,0.00799251,1.934125e-11,0.0002831756
2019-01-03,1.033523e-06,2.12288,3.122924e-09
2019-01-04,9.760924,0.0001065112,0.6052274
2019-01-05,6.810132e-07,4.212997e-08,0.03422619
2019-01-06,0.04435215,0.4707584,1.781066e-05


Boolean operators work as well:

In [64]:
df1 = pd.DataFrame({'x': [1, 0, 1], 'y': [0, 1, 1]}, dtype=bool)

In [65]:
df2 = pd.DataFrame({'x': [0, 1, 1], 'y': [1, 1, 0]}, dtype=bool)

In [66]:
df1 & df2

Unnamed: 0,x,y
0,False,False
1,False,True
2,True,False


In [67]:
df1 | df2

Unnamed: 0,x,y
0,True,True
1,True,True
2,True,True


In [68]:
df1 ^ df2

Unnamed: 0,x,y
0,True,True
1,True,False
2,False,True


In [69]:
-df1

Unnamed: 0,x,y
0,False,True
1,True,False
2,False,False


**Show the first 5 rows:**

In [70]:
df[:5].T

Unnamed: 0,2019-01-01 00:00:00,2019-01-02 00:00:00,2019-01-03 00:00:00,2019-01-04 00:00:00,2019-01-05 00:00:00
X,0.723344,0.447144,0.100551,-1.461892,-0.093798
Y,-0.086666,-0.016384,-1.133672,-0.21772,-0.058988
Z,0.345296,-0.256257,0.038232,-0.919715,-0.569805


DataFrame interoperability with NumPy functions

In [71]:
np.exp(df)

Unnamed: 0,X,Y,Z
2019-01-01,2.061315,0.916983,1.412407
2019-01-02,1.563839,0.98375,0.773943
2019-01-03,1.10578,0.321849,1.038972
2019-01-04,0.231797,0.80435,0.398633
2019-01-05,0.910467,0.942718,0.565636
2019-01-06,0.551586,2.415715,1.175392


In [72]:
np.asarray(df)

array([[ 0.72334402, -0.08666649,  0.34529553],
       [ 0.44714378, -0.01638378, -0.25625714],
       [ 0.10055106, -1.13367237,  0.03823199],
       [-1.46189155, -0.21772045, -0.91971464],
       [-0.09379779, -0.05898821, -0.56980521],
       [-0.59495728,  0.88199546,  0.16160197]])

pandas automatically align labeled inputs as part of a ufunc with multiple inputs.<br>
For example, using numpy.remainder() on two Series with differently ordered labels will<br>
align before the operation.

In [73]:
ser1 = pd.Series([2, 3, 4], index=['p', 'q', 'r'])

In [74]:
ser2 = pd.Series([3, 4, 5], index=['q', 'p', 'r'])

In [75]:
ser1

p    2
q    3
r    4
dtype: int64

In [76]:
ser2

q    3
p    4
r    5
dtype: int64

In [77]:
np.remainder(ser1, ser2)

p    2
q    3
r    4
dtype: int64

As usual, the union of the two indices is taken, and non-overlapping values are filled with missing values.

In [78]:
ser3 = pd.Series([4, 6, 8], index=['q', 'r', 's'])

In [79]:
ser3

q    4
r    6
s    8
dtype: int64

In [80]:
np.remainder(ser1, ser3)

p    2
q    3
r    4
dtype: int64

When a binary ufunc is applied to a Series and Index, the Series implementation takes precedence and<br>
a Series is returned.

In [81]:
ser = pd.Series([2, 3, 4])

In [82]:
idx = pd.Index([5, 6, 7])

In [83]:
np.maximum(ser, idx)

0    5
1    6
2    7
dtype: int64

NumPy ufuncs are safe to apply to Series backed by non-ndarray arrays.<br>
If possible, the ufunc is applied without converting the underlying data to an ndarray.

**Console display**

Very large DataFrames will be truncated to display them in the console.

In [84]:
baseball = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/doc/data/baseball.csv')

In [85]:
print(baseball)

       id     player  year  stint team  lg    g   ab   r    h  ...    rbi  \
0   88641  womacto01  2006      2  CHN  NL   19   50   6   14  ...    2.0   
1   88643  schilcu01  2006      1  BOS  AL   31    2   0    1  ...    0.0   
2   88645  myersmi01  2006      1  NYA  AL   62    0   0    0  ...    0.0   
3   88649  helliri01  2006      1  MIL  NL   20    3   0    0  ...    0.0   
4   88650  johnsra05  2006      1  NYA  AL   33    6   0    1  ...    0.0   
5   88652  finlest01  2006      1  SFN  NL  139  426  66  105  ...   40.0   
6   88653  gonzalu01  2006      1  ARI  NL  153  586  93  159  ...   73.0   
7   88662   seleaa01  2006      1  LAN  NL   28   26   2    5  ...    0.0   
8   89177  francju01  2007      2  ATL  NL   15   40   1   10  ...    8.0   
9   89178  francju01  2007      1  NYN  NL   40   50   7   10  ...    8.0   
10  89330   zaungr01  2007      1  TOR  AL  110  331  43   80  ...   52.0   
11  89333  witasja01  2007      1  TBA  AL    3    0   0    0  ...    0.0   

In [86]:
baseball.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 23 columns):
id        100 non-null int64
player    100 non-null object
year      100 non-null int64
stint     100 non-null int64
team      100 non-null object
lg        100 non-null object
g         100 non-null int64
ab        100 non-null int64
r         100 non-null int64
h         100 non-null int64
X2b       100 non-null int64
X3b       100 non-null int64
hr        100 non-null int64
rbi       100 non-null float64
sb        100 non-null float64
cs        100 non-null float64
bb        100 non-null int64
so        100 non-null float64
ibb       100 non-null float64
hbp       100 non-null float64
sh        100 non-null float64
sf        100 non-null float64
gidp      100 non-null float64
dtypes: float64(9), int64(11), object(3)
memory usage: 18.0+ KB


However, using to_string will return a string representation of the DataFrame in tabular form, though<br>
it won’t always fit the console width:

In [87]:
print(baseball.iloc[-20:, :10].to_string())

       id     player  year  stint team  lg    g   ab   r    h
80  89474  finlest01  2007      1  COL  NL   43   94   9   17
81  89480  embreal01  2007      1  OAK  AL    4    0   0    0
82  89481  edmonji01  2007      1  SLN  NL  117  365  39   92
83  89482  easleda01  2007      1  NYN  NL   76  193  24   54
84  89489  delgaca01  2007      1  NYN  NL  139  538  71  139
85  89493  cormirh01  2007      1  CIN  NL    6    0   0    0
86  89494  coninje01  2007      2  NYN  NL   21   41   2    8
87  89495  coninje01  2007      1  CIN  NL   80  215  23   57
88  89497  clemero02  2007      1  NYA  AL    2    2   0    1
89  89498  claytro01  2007      2  BOS  AL    8    6   1    0
90  89499  claytro01  2007      1  TOR  AL   69  189  23   48
91  89501  cirilje01  2007      2  ARI  NL   28   40   6    8
92  89502  cirilje01  2007      1  MIN  AL   50  153  18   40
93  89521  bondsba01  2007      1  SFN  NL  126  340  75   94
94  89523  biggicr01  2007      1  HOU  NL  141  517  68  130
95  8952

Wide DataFrames will be printed across multiple rows by default:

In [88]:
pd.DataFrame(np.random.randn(4, 10))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,-1.162526,0.474868,0.15752,-0.226966,0.288694,-0.204342,-0.454281,-0.492041,0.626862,0.603523
1,0.741971,-0.570777,2.065719,0.844021,0.010747,1.188861,-1.831683,-0.340063,0.471729,0.140003
2,0.906849,0.335401,-0.159554,2.874012,1.539447,1.144927,-0.90208,-0.008464,0.475082,-0.746442
3,-0.377446,0.023604,-0.616251,1.465809,-0.25574,-1.556993,-1.046476,2.627271,1.470708,0.409682


You can change how much to print on a single row by setting the display.width option:

In [89]:
pd.set_option('display.width', 30)

In [90]:
pd.DataFrame(np.random.randn(4, 10))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,-2.31865,-0.313811,1.507447,-0.665551,0.694109,-0.522585,0.100735,-0.246376,1.584034,-0.390303
1,1.120756,-0.513528,-1.517324,0.908516,0.356448,0.036096,2.135186,-0.004657,1.790709,-0.3023
2,-0.67745,1.360683,-0.929337,-0.739711,0.434875,0.171721,-0.394395,1.956881,0.874484,-0.963741
3,1.195355,0.090944,-0.69915,-0.653663,0.075695,-0.140222,0.132759,-0.377024,-0.758761,-0.64223


You can adjust the max width of the individual columns by setting display.max_colwidth

In [91]:
datafile = {'filename': ['filename_01', 'filename_02'],
             'path': ["media/user_name/storage/folder_01/filename_01",
                      "media/user_name/storage/folder_02/filename_02"]}

In [92]:
pd.set_option('display.max_colwidth', 40)

In [93]:
pd.DataFrame(datafile)

Unnamed: 0,filename,path
0,filename_01,media/user_name/storage/folder_01/fi...
1,filename_02,media/user_name/storage/folder_02/fi...


In [95]:
pd.set_option('display.max_colwidth', 100)

In [96]:
pd.DataFrame(datafile)

Unnamed: 0,filename,path
0,filename_01,media/user_name/storage/folder_01/filename_01
1,filename_02,media/user_name/storage/folder_02/filename_02


You can also disable this feature via the expand_frame_repr option. This will print the table in one block.

**DataFrame column attribute access and IPython completion**

If a DataFrame column label is a valid Python variable name, the column can be accessed like an attribute:

In [97]:
df = pd.DataFrame({'boo1': np.random.randn(4),
                   'boo2': np.random.randn(4)})
df

Unnamed: 0,boo1,boo2
0,1.073844,0.586491
1,1.050296,0.241025
2,0.726989,0.72595
3,1.253145,-0.064325


In [98]:
df.boo2

0    0.586491
1    0.241025
2    0.725950
3   -0.064325
Name: boo2, dtype: float64