# Fast scalar value get and set

A scalar value is one unit of data

Since indexing with [   ] must handle a lot of cases (Single label access, slicing, boolean indexing) 
there's a bit of overhead in order to figure out what you're asking for In you want to access 
the scalar data, the fastest way is to use the at and iat methods, which are implemented in all 
the Pandas data structues.

The Python and NumPy indexing operators [] and attribute operator . provide quick and easy access to pandas data structures across a wide range of use cases. This makes interactive work intuitive, as there’s little new to learn if you already know how to deal with Python dictionaries and NumPy arrays. However, since the type of the data to be accessed isn’t known in advance, directly using standard operators has some optimization limits. For production code, we recommended that you take advantage of the optimized pandas data access methods exposed in this chapter.

In [2]:
import pandas as pd
import numpy as np
from IPython.display import HTML, display
import tabulate  #

d = "Returns DataFrame"
se = " Returns Series"
bglb = 'background: lightblue'
bglg = 'background: lightgreen'
t = ['DataFrame[ ] Methods','DataFrame.loc[ ]']

data ={'c1':[11,12,13,14],'c2':['21','22','23','24'],'c3':['31','32','33','34']}
df = pd.DataFrame(data,index=['r1','r2','r3','r4'])
df_auto_index = pd.DataFrame(data)
s = df['c1']
df

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34


In [3]:
def test_var_args(f_arg, *argv):
    print ("first normal arg:", f_arg)
    for arg in argv:
        print ("another arg through *argv :", arg)

test_var_args('yasoob','python','eggs','test')

first normal arg: yasoob
another arg through *argv : python
another arg through *argv : eggs
another arg through *argv : test


In [4]:
#tabulate.PRESERVE_WHITESPACE = False


dc1 = df.style.apply(lambda x: [ bglb if x.name == 'c1' else '' for i in x],axis=0)
dc2 = df.style.apply(lambda x: [ bglg if x.name != 'c3' else '' for i in x])
dc3 = df.style.apply(lambda x: [ bglg if any([x.name == 'r1',x.name=='r2']) else '' for i in x], 
               axis=1)
dc4 = df.style.apply(lambda x: [ bglg if any([x.name == 'r3',x.name=='r4']) else '' for i in x], 
               axis=1)
dc5 = df.style.apply(lambda x: [ bglb if any([x.name == 'r1']) else '' for i in x], 
               axis=1)
dc6 = df.style.highlight_min(axis=0)
sc1 = tabulate.tabulate(df.c1.to_frame(),tablefmt='html')
sc2 = tabulate.tabulate(df[['c1','c2']],tablefmt='html',headers=df[['c1','c2']].columns.tolist())
sc3 = tabulate.tabulate(df[0:2],tablefmt='html', headers=df[0:2].columns.tolist())
sc4 = tabulate.tabulate(df[df['c1'] > 12],tablefmt='html', headers=df[df['c1'] > 12].columns.tolist())


table = [t,
         ["DataFrame['column_one']"+" \n "+dc1.render()+"---"+se+"---","DataFrame.loc['row_one']"+dc5.render()+"---"+se+"---"],
         ["DataFrame[['column_one','column_two']"+dc2.render()+"---"+d+"---","DataFrame.loc[['row_one','row_two']]"+dc3.render()+"---"+d+"---"],
         ["DataFrame[0:2]"+dc3.render()+"---"+d+"---","df.loc['row_one':'row_two']"+dc3.render()+"---"+d+"---"],
         ["DataFrame[DataFrame['c1'] > 12]"+dc4.render()+"---"+d+"---","df.loc['a','c1']"+dc6.render()+"---"+"Returns Scalar Value"+"---"],   
         
        ]

display(HTML(tabulate.tabulate(table, tablefmt='html',numalign='center',stralign='None')))




Unnamed: 0_level_0,c1,c2,c3
Unnamed: 0_level_1,c1,c2,c3
Unnamed: 0_level_2,c1,c2,c3
Unnamed: 0_level_3,c1,c2,c3
Unnamed: 0_level_4,c1,c2,c3
Unnamed: 0_level_5,c1,c2,c3
Unnamed: 0_level_6,c1,c2,c3
Unnamed: 0_level_7,c1,c2,c3
DataFrame[ ] Methods,DataFrame.loc[ ],,
DataFrame['column_one'] c1  c2  c3  r1  11  21  31  r2  12  22  32  r3  13  23  33  r4  14  24  34  --- Returns Series---,DataFrame.loc['row_one']  c1  c2  c3  r1  11  21  31  r2  12  22  32  r3  13  23  33  r4  14  24  34  --- Returns Series---,,
,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34
,c1,c2,c3
r1,11,21,31
r2,12,22,32

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34

Unnamed: 0,c1,c2,c3
r1,11,21,31
r2,12,22,32
r3,13,23,33
r4,14,24,34


In [7]:
def see(df):
    
   
    
       
    return df

In [57]:
df.loc['a','c1']

11

<table style="width:100%">
<tr>
    <th>DataFrame[  ]</th>
    <th>DataFrame.loc[  ]</th>
    <th>DataFrame.iloc[  ]</th>
</tr>
<tr>
    <td>DataFrame['column_one']<br><br><u>Single Label</u> returns <b>Series</b>
    <% import datetime
        print datetime.date.today().strftime("%d:%m:%y")
       %>
    </td>
    <td><u>Single Label</u><br>DataFrame.loc['row_one'] <br> <br>(or index number)<br>returns row <b>Series</b></td>
    <td><u>Single Label</u><br>DataFrame.iloc[index_number] <br> <br>returns row <b>Series</b></td>
</tr>
<tr>
    <td>DataFrame[['column_one','column_two']<br><br><u>Multiple Label</u> returns filtered columns <b>DataFrame</b> </td>
    <td><u>Multiple Label</u><br>DataFrame.loc[['row_one','row_two']]<br><br> (or index number)<br> returns filtered rows <b>DataFrame</b></td>
    <td><u>Multiple Rows</u><br>DataFrame.iloc[[2,3]]<br><br><br> returns filtered rows <b>DataFrame</b></td>
</tr>
<tr>
    <td><u>Slicing</u> returns first 3 rows <b>DataFrame</b><br>DataFrame[0:2]<br><br> </td>
    <td><u>Slicing</u><br>DataFrame.iloc['row_two':'row_three']<br><br> (or index number)<br> returns filtered two rows <b>DataFrame</b></td>
    <td><u>Slicing</u><br>DataFrame.loc[1:4]<br><br> <br> returns rows 1 thru 4 <b>DataFrame</b></td>
</tr>
<tr>
    <td><u>Boolean</u><br>DataFrame[DataFrame['c1'] > 12]<br><br> returned filtered rows <b>DataFrame</b><br> by evaluating scalar numbers </td>
    <td><u>By [index,column]</u><br>DataFrame.loc[index,'column_name']<br><br> returned  <b>Scalar</b><br>  </td>
    <td><u>By [index,column]</u><br>DataFrame.iloc[index,column_index]<br><br> returned  <b>Scalar </b>Value<br>  </td>
</tr>
</table> 

In [8]:
df['c1']

a    11
b    12
c    13
d    14
Name: c1, dtype: int64

In [9]:
df[['c1','c2']].columns.tolist()

['c1', 'c2']

 <table style="width:100%">
  <tr>
    <th>Series[  ]</th>
    <th>Series.loc[  ]</th>
    <th>Series.iloc[  ]</th>
  </tr>
  <tr>
    <td><u>Single Label</u><br>DataFrame['column_one'] <br><br>returns <b>Series</b></td>
    <td><u>Single Label</u><br>DataFrame.loc['row_one'] <br> <br>(or index number)<br>returns row <b>Series</b></td>
    <td><u>Single Label</u><br>DataFrame.iloc[index_number] <br> <br>returns row <b>Series</b></td>
  </tr>
  <tr>
    <td><u>Multiple Label</u><br>DataFrame[['column_one','column_two']<br><br> returns two columns <b>DataFrame</b></td>
    <td><u>Multiple Label</u><br>DataFrame.loc[['row_one','row_two']]<br><br> (or index number)<br> returns filtered rows <b>DataFrame</b></td>
    <td><u>Multiple Rows</u><br>DataFrame.iloc[[2,3]]<br><br><br> returns filtered rows <b>DataFrame</b></td>
  </tr>
  <tr>
    <td><u>Slicing</u><br>DataFrame[0:3]<br><br> returns first 3 rows <b>DataFrame</b></td>
    <td><u>Slicing</u><br>DataFrame.iloc['row_two':'row_three']<br><br> (or index number)<br> returns filtered two rows <b>DataFrame</b></td>
    <td><u>Slicing</u><br>DataFrame.loc[1:4]<br><br> <br> returns rows 1 thru 4 <b>DataFrame</b></td>
  </tr>
  <tr>
    <td><u>Boolean</u><br>DataFrame[DataFrame['c1'] > 12]<br><br> returned filtered rows <b>DataFrame</b><br> by evaluating scalar numbers </td>
    <td><u>By [index,column]</u><br>DataFrame.loc[index,'column_name']<br><br> returned  <b>Scalar</b><br>  </td>
    <td><u>By [index,column]</u><br>DataFrame.iloc[index,column_index]<br><br> returned  <b>Scalar </b>Value<br>  </td>
  </tr>
</table> 

In [60]:
#tabulate.PRESERVE_WHITESPACE = False


dc1 = df.style.apply(lambda x: [ bglb if x.name == 'c1' else '' for i in x])
dc2 = df.style.apply(lambda x: [ bglg if x.name != 'c3' else '' for i in x])
dc3 = df.style.apply(lambda x: [ bglg if any([x.name == 'a',x.name=='b']) else '' for i in x], 
               axis=1)
dc4 = df.style.apply(lambda x: [ bglg if any([x.name == 'c',x.name=='d']) else '' for i in x], 
               axis=1)
dc5 = df.style.apply(lambda x: [ bglb if any([x.name == 'a']) else '' for i in x], 
               axis=1)
dc6 = df.style.highlight_min(axis=0)
sc1 = tabulate.tabulate(df.c1.to_frame(),tablefmt='html')
sc2 = tabulate.tabulate(df[['c1','c2']],tablefmt='html',headers=df[['c1','c2']].columns.tolist())
sc3 = tabulate.tabulate(df[0:2],tablefmt='html', headers=df[0:2].columns.tolist())
sc4 = tabulate.tabulate(df[df['c1'] > 12],tablefmt='html', headers=df[df['c1'] > 12].columns.tolist())


table = [t,
         ["DataFrame['column_one']"+" \n "+dc1.render()+"---"+se+"---","DataFrame.loc['row_one']"+dc5.render()+"---"+se+"---"],
         ["DataFrame[['column_one','column_two']"+dc2.render()+"---"+d+"---","DataFrame.loc[['row_one','row_two']]"+dc3.render()+"---"+d+"---"],
         ["DataFrame[0:2]"+dc3.render()+"---"+d+"---","df.loc['row_one':'row_two']"+dc3.render()+"---"+d+"---"],
         ["DataFrame[DataFrame['c1'] > 12]"+dc4.render()+"---"+d+"---","df.loc['a','c1']"+dc6.render()+"---"+"Returns Scalar Value"+"---"],   
         
        ]

display(HTML(tabulate.tabulate(table, tablefmt='html',numalign='center',stralign='None')))


Unnamed: 0_level_0,c1,c2,c3
Unnamed: 0_level_1,c1,c2,c3
Unnamed: 0_level_2,c1,c2,c3
Unnamed: 0_level_3,c1,c2,c3
Unnamed: 0_level_4,c1,c2,c3
Unnamed: 0_level_5,c1,c2,c3
Unnamed: 0_level_6,c1,c2,c3
Unnamed: 0_level_7,c1,c2,c3
DataFrame[ ] Methods,DataFrame.loc[ ],,
DataFrame['column_one'] c1  c2  c3  a  11  21  31  b  12  22  32  c  13  23  33  d  14  24  34  --- Returns Series---,DataFrame.loc['row_one']  c1  c2  c3  a  11  21  31  b  12  22  32  c  13  23  33  d  14  24  34  --- Returns Series---,,
,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34
,c1,c2,c3
a,11,21,31
b,12,22,32

Unnamed: 0,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34

Unnamed: 0,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34

Unnamed: 0,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34

Unnamed: 0,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34

Unnamed: 0,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34

Unnamed: 0,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34

Unnamed: 0,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34

Unnamed: 0,c1,c2,c3
a,11,21,31
b,12,22,32
c,13,23,33
d,14,24,34


In [11]:
s[0]

11

## indexing

So there's quite a few ways to get certain things out of 
a DataFrame or a Series.

By far the most popular way is using the brackets ***[ ]***

Getting values from an object with multi-axes selection uses the following notation (using .loc as an example, but applies to .iloc and .ix as well). Any of the axes accessors may be the null slice :. Axes left out of the specification are assumed to be :. (e.g. p.loc['a'] is equiv to p.loc['a', :, :])

 <table style="width:100%">
  <tr>
    <th>Object</th>
    <th>Syntax</th>
    <th>Returns</th>
  </tr>
  <tr>
    <td>DataFrame</td>
    <td>DataFrame[colname]</td>
    <td>Series corresponding to colname</td>
  </tr>
  <tr>
    <td>Series</td>
    <td>series[label]</td>
    <td>Scalar value</td>
  </tr>
</table> 

In [12]:
first_col = df['c1']
first_col

a    11
b    12
c    13
d    14
Name: c1, dtype: int64

In [13]:
first_thing = first_col[0]
first_thing

11

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner:

In [14]:
two_columns = df[['c1','c2']]
two_columns

Unnamed: 0,c1,c2
a,11,21
b,12,22
c,13,23
d,14,24


.loc is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. Allowed inputs are:

    A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)

    A list or array of labels ['a', 'b', 'c']

    A slice object with labels 'a':'f', (note that contrary to usual python slices, both the start and the stop are included!)

    A boolean array

    A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

 <table style="width:100%">
  <tr>
    <th>Object</th>
    <th>Syntax</th>
    <th>Returns</th>
  </tr>
  <tr>
    <td>DataFrame</td>
    <td>df.loc[row_indexer,column_indexer]</td>
    <td>Scalar value</td>
  </tr>
  <tr>
    <td>Series</td>
    <td>s.loc[indexer]</td>
    <td>Scalar value</td>
  </tr>
</table> 


In [None]:
df.loc[0,'c1']

In [None]:
first_col.loc[1]

In [None]:
df.loc[[0,1],'c1']

.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are:

    An integer e.g. 5

    A list or array of integers [4, 3, 0]

    A slice object with ints 1:7

    A boolean array

    A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

In [None]:
df.iloc[0]

With DataFrame, slicing inside of [] ***slices the rows.*** This is provided largely as a convenience since it is such a common operation.

In [None]:
df[:1]

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses.

Using a boolean vector to index a Series works exactly as in a numpy ndarray: