This notebook is based on the surprisingly helpful tutorial from their [official documentation](https://pandas.pydata.org/docs/user_guide/10min.html) 

In [1]:
!pip install pandas 

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/usr/bin/python3 -m pip install --upgrade pip[0m


In [2]:
import numpy as np
import pandas as pd

# Basic data structures in pandas

Pandas provides two types of classes for handling data
1. `Series`: a one-dimensional labeled array holding data of any type such as integers, strings, Python objects etc.
2. `DataFrame`: a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.lumns.

## Object creation
Creating a Series by passing a list of values, letting pandas create a default RangeIndex.

In [4]:
s = pd.Series([1, 3, 5, np.nan, 6, 8]) # different types
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a DataFrame by passing a NumPy array with a `datetime` index using `date_range()` and labeled columns:

In [5]:
dates = pd.date_range("20130101", periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [6]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

Unnamed: 0,A,B,C,D
2013-01-01,-2.142933,-0.919831,0.388967,-0.096977
2013-01-02,-0.652408,-1.295199,0.610316,-0.749487
2013-01-03,-1.823859,0.213019,-1.083522,0.573589
2013-01-04,0.808892,-0.871928,0.617026,-0.273238
2013-01-05,-1.393803,0.860246,-0.094055,0.092548
2013-01-06,1.229415,-0.712939,0.811714,3.010295


In [7]:
# Creating a DataFrame by passing a dictionary of objects where the keys are the column labels and the values are the column values.
df2 = pd.DataFrame(
 {
     "A": 1.0,
     "B": pd.Timestamp("20130102"),
     "C": pd.Series(1, index=list(range(4)), dtype="float32"),
     "D": np.array([3] * 4, dtype="int32"),
     "E": pd.Categorical(["test", "train", "test", "train"]),
     "F": "foo",
 }
)

df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


In [8]:
df2.dtypes

A          float64
B    datetime64[s]
C          float32
D            int32
E         category
F           object
dtype: object

## Viewing Data
Use `DataFrame.head()` and `DataFrame.tail()` to view the top and bottom rows of the frame respectively:

In [9]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,-2.142933,-0.919831,0.388967,-0.096977
2013-01-02,-0.652408,-1.295199,0.610316,-0.749487
2013-01-03,-1.823859,0.213019,-1.083522,0.573589
2013-01-04,0.808892,-0.871928,0.617026,-0.273238
2013-01-05,-1.393803,0.860246,-0.094055,0.092548


In [10]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,0.808892,-0.871928,0.617026,-0.273238
2013-01-05,-1.393803,0.860246,-0.094055,0.092548
2013-01-06,1.229415,-0.712939,0.811714,3.010295


In [11]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [12]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [13]:
# Return a NumPy representation of the underlying data with DataFrame.to_numpy() without the index or column labels:
df.to_numpy()

array([[-2.14293333, -0.91983131,  0.3889666 , -0.09697667],
       [-0.65240833, -1.29519877,  0.61031554, -0.74948652],
       [-1.82385915,  0.21301945, -1.08352222,  0.57358886],
       [ 0.80889188, -0.87192784,  0.61702556, -0.27323843],
       [-1.39380279,  0.86024603, -0.0940551 ,  0.09254834],
       [ 1.22941506, -0.71293859,  0.811714  ,  3.01029506]])

__Note__: NumPy arrays have one dtype for the entire array while pandas DataFrames have one dtype per column. When you call `DataFrame.to_numpy()`, pandas will find the NumPy dtype that can hold all of the dtypes in the DataFrame. If the common data type is object, `DataFrame.to_numpy()` will require copying data.

In [14]:
df2.dtypes

A          float64
B    datetime64[s]
C          float32
D            int32
E         category
F           object
dtype: object

In [15]:
df2.to_numpy()

array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)

`describe()` shows a quick statistic summary of your data:



In [16]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.662449,-0.454439,0.208407,0.426122
std,1.401405,0.817127,0.705074,1.338339
min,-2.142933,-1.295199,-1.083522,-0.749487
25%,-1.716345,-0.907855,0.0267,-0.229173
50%,-1.023106,-0.792433,0.499641,-0.002214
75%,0.443567,-0.01847,0.615348,0.453329
max,1.229415,0.860246,0.811714,3.010295


In [19]:
df

Unnamed: 0,A,B,C,D
2013-01-01,-2.142933,-0.919831,0.388967,-0.096977
2013-01-02,-0.652408,-1.295199,0.610316,-0.749487
2013-01-03,-1.823859,0.213019,-1.083522,0.573589
2013-01-04,0.808892,-0.871928,0.617026,-0.273238
2013-01-05,-1.393803,0.860246,-0.094055,0.092548
2013-01-06,1.229415,-0.712939,0.811714,3.010295


In [18]:
# transpose 
df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,-2.142933,-0.652408,-1.823859,0.808892,-1.393803,1.229415
B,-0.919831,-1.295199,0.213019,-0.871928,0.860246,-0.712939
C,0.388967,0.610316,-1.083522,0.617026,-0.094055,0.811714
D,-0.096977,-0.749487,0.573589,-0.273238,0.092548,3.010295


In [20]:
# DataFrame.sort_index() sorts by an axis:
df.sort_index(axis=1, ascending=False)


Unnamed: 0,D,C,B,A
2013-01-01,-0.096977,0.388967,-0.919831,-2.142933
2013-01-02,-0.749487,0.610316,-1.295199,-0.652408
2013-01-03,0.573589,-1.083522,0.213019,-1.823859
2013-01-04,-0.273238,0.617026,-0.871928,0.808892
2013-01-05,0.092548,-0.094055,0.860246,-1.393803
2013-01-06,3.010295,0.811714,-0.712939,1.229415


In [21]:
# DataFrame.sort_values() sorts by values:
df.sort_values(by="B")

Unnamed: 0,A,B,C,D
2013-01-02,-0.652408,-1.295199,0.610316,-0.749487
2013-01-01,-2.142933,-0.919831,0.388967,-0.096977
2013-01-04,0.808892,-0.871928,0.617026,-0.273238
2013-01-06,1.229415,-0.712939,0.811714,3.010295
2013-01-03,-1.823859,0.213019,-1.083522,0.573589
2013-01-05,-1.393803,0.860246,-0.094055,0.092548


## Selection

While standard Python / NumPy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, we recommend the optimized pandas data access methods:
- `DataFrame.at()`
- `DataFrame.iat()`
- `DataFrame.loc()`
- `DataFrame.iloc()`

For a DataFrame, passing a single label selects a columns and yields a Series equivalent to `df.A`:



In [22]:
df["A"]


2013-01-01   -2.142933
2013-01-02   -0.652408
2013-01-03   -1.823859
2013-01-04    0.808892
2013-01-05   -1.393803
2013-01-06    1.229415
Freq: D, Name: A, dtype: float64

For a DataFrame, passing a slice : selects matching rows:



In [23]:
df[0:3]

Unnamed: 0,A,B,C,D
2013-01-01,-2.142933,-0.919831,0.388967,-0.096977
2013-01-02,-0.652408,-1.295199,0.610316,-0.749487
2013-01-03,-1.823859,0.213019,-1.083522,0.573589


In [24]:
df["20130102":"20130104"]

Unnamed: 0,A,B,C,D
2013-01-02,-0.652408,-1.295199,0.610316,-0.749487
2013-01-03,-1.823859,0.213019,-1.083522,0.573589
2013-01-04,0.808892,-0.871928,0.617026,-0.273238


## Selection by label

See more in Selection by Label using `DataFrame.loc()` or `DataFrame.at()`.

Selecting a row matching a label:

In [25]:
df.loc[dates[0]]

A   -2.142933
B   -0.919831
C    0.388967
D   -0.096977
Name: 2013-01-01 00:00:00, dtype: float64

Selecting all rows `(:)` with a select column labels:



In [26]:
df.loc[:, ["A", "B"]]

Unnamed: 0,A,B
2013-01-01,-2.142933,-0.919831
2013-01-02,-0.652408,-1.295199
2013-01-03,-1.823859,0.213019
2013-01-04,0.808892,-0.871928
2013-01-05,-1.393803,0.860246
2013-01-06,1.229415,-0.712939


In [27]:
# For label slicing, both endpoints are included:
df.loc["20130102":"20130104", ["A", "B"]]

Unnamed: 0,A,B
2013-01-02,-0.652408,-1.295199
2013-01-03,-1.823859,0.213019
2013-01-04,0.808892,-0.871928


In [28]:
# Selecting a single row and column label returns a scalar:
df.loc[dates[0], "A"]

-2.1429333283118814

In [29]:
# faster 
df.at[dates[0], "A"]

-2.1429333283118814

## Selection by position

See more in Selection by Position using` DataFrame.iloc(`) or` DataFrame.iat(`).

Select via the position of the passed integers:

In [30]:
df.iloc[3]

A    0.808892
B   -0.871928
C    0.617026
D   -0.273238
Name: 2013-01-04 00:00:00, dtype: float64

In [31]:
df.iloc[3:5, 0:2]

Unnamed: 0,A,B
2013-01-04,0.808892,-0.871928
2013-01-05,-1.393803,0.860246


In [33]:
df.iloc[[1, 2, 4], [0, 2]]

Unnamed: 0,A,C
2013-01-02,-0.652408,0.610316
2013-01-03,-1.823859,-1.083522
2013-01-05,-1.393803,-0.094055


In [34]:
df.iloc[1:3, :]

Unnamed: 0,A,B,C,D
2013-01-02,-0.652408,-1.295199,0.610316,-0.749487
2013-01-03,-1.823859,0.213019,-1.083522,0.573589


In [35]:
df.iloc[:, 1:3]

Unnamed: 0,B,C
2013-01-01,-0.919831,0.388967
2013-01-02,-1.295199,0.610316
2013-01-03,0.213019,-1.083522
2013-01-04,-0.871928,0.617026
2013-01-05,0.860246,-0.094055
2013-01-06,-0.712939,0.811714


In [36]:
df.iloc[1, 1]

-1.2951987650123933

In [37]:
# faster 
df.iat[1, 1]

-1.2951987650123933

## Boolean indexing

In [38]:
df[df["A"] > 0]

Unnamed: 0,A,B,C,D
2013-01-04,0.808892,-0.871928,0.617026,-0.273238
2013-01-06,1.229415,-0.712939,0.811714,3.010295


In [39]:
df[df > 0]

Unnamed: 0,A,B,C,D
2013-01-01,,,0.388967,
2013-01-02,,,0.610316,
2013-01-03,,0.213019,,0.573589
2013-01-04,0.808892,,0.617026,
2013-01-05,,0.860246,,0.092548
2013-01-06,1.229415,,0.811714,3.010295


In [43]:
df2 = df.copy()
df2["E"] = ["one", "one", "two", "three", "four", "three"]
df2

Unnamed: 0,A,B,C,D,E
2013-01-01,-2.142933,-0.919831,0.388967,-0.096977,one
2013-01-02,-0.652408,-1.295199,0.610316,-0.749487,one
2013-01-03,-1.823859,0.213019,-1.083522,0.573589,two
2013-01-04,0.808892,-0.871928,0.617026,-0.273238,three
2013-01-05,-1.393803,0.860246,-0.094055,0.092548,four
2013-01-06,1.229415,-0.712939,0.811714,3.010295,three


In [44]:
# filtering using isin

df2[df2["E"].isin(["two", "four"])]

Unnamed: 0,A,B,C,D,E
2013-01-03,-1.823859,0.213019,-1.083522,0.573589,two
2013-01-05,-1.393803,0.860246,-0.094055,0.092548,four


## Setting
Setting a new column automatically aligns the data by the indexes:



In [46]:
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))

In [47]:
s1

2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64

In [49]:
# setting by label
df.at[dates[0], "A"] = 0


In [50]:
# setting by position
df.iat[0, 1] = 0

In [51]:
# setting by assignment
df.loc[:, "D"] = np.array([5] * len(df))


In [52]:
df

Unnamed: 0,A,B,C,D
2013-01-01,0.0,0.0,0.388967,5.0
2013-01-02,-0.652408,-1.295199,0.610316,5.0
2013-01-03,-1.823859,0.213019,-1.083522,5.0
2013-01-04,0.808892,-0.871928,0.617026,5.0
2013-01-05,-1.393803,0.860246,-0.094055,5.0
2013-01-06,1.229415,-0.712939,0.811714,5.0


In [53]:
df2 = df.copy()
df2[df2 > 0] = -df2
df2

Unnamed: 0,A,B,C,D
2013-01-01,0.0,0.0,-0.388967,-5.0
2013-01-02,-0.652408,-1.295199,-0.610316,-5.0
2013-01-03,-1.823859,-0.213019,-1.083522,-5.0
2013-01-04,-0.808892,-0.871928,-0.617026,-5.0
2013-01-05,-1.393803,-0.860246,-0.094055,-5.0
2013-01-06,-1.229415,-0.712939,-0.811714,-5.0


## Missing data

For NumPy data types, `np.nan` represents missing data. 
It is by default not included in computations. 

Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data:

In [54]:
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])

In [56]:
df1.loc[dates[0] : dates[1], "E"] = 1

In [57]:
df1

Unnamed: 0,A,B,C,D,E
2013-01-01,0.0,0.0,0.388967,5.0,1.0
2013-01-02,-0.652408,-1.295199,0.610316,5.0,1.0
2013-01-03,-1.823859,0.213019,-1.083522,5.0,
2013-01-04,0.808892,-0.871928,0.617026,5.0,


In [58]:
# DataFrame.dropna() drops any rows that have missing data:
df1.dropna(how="any")

Unnamed: 0,A,B,C,D,E
2013-01-01,0.0,0.0,0.388967,5.0,1.0
2013-01-02,-0.652408,-1.295199,0.610316,5.0,1.0


In [59]:
# DataFrame.fillna() fills missing data:
df1.fillna(value=5)

Unnamed: 0,A,B,C,D,E
2013-01-01,0.0,0.0,0.388967,5.0,1.0
2013-01-02,-0.652408,-1.295199,0.610316,5.0,1.0
2013-01-03,-1.823859,0.213019,-1.083522,5.0,5.0
2013-01-04,0.808892,-0.871928,0.617026,5.0,5.0


In [60]:
# isna() gets the boolean mask where values are nan:
pd.isna(df1)

Unnamed: 0,A,B,C,D,E
2013-01-01,False,False,False,False,False
2013-01-02,False,False,False,False,False
2013-01-03,False,False,False,False,True
2013-01-04,False,False,False,False,True


## Operations

In [63]:
df

Unnamed: 0,A,B,C,D
2013-01-01,0.0,0.0,0.388967,5.0
2013-01-02,-0.652408,-1.295199,0.610316,5.0
2013-01-03,-1.823859,0.213019,-1.083522,5.0
2013-01-04,0.808892,-0.871928,0.617026,5.0
2013-01-05,-1.393803,0.860246,-0.094055,5.0
2013-01-06,1.229415,-0.712939,0.811714,5.0


In [61]:
df.mean()


A   -0.305294
B   -0.301133
C    0.208407
D    5.000000
dtype: float64

In [62]:
df.mean(axis=1)

2013-01-01    1.347242
2013-01-02    0.915677
2013-01-03    0.576410
2013-01-04    1.388497
2013-01-05    1.093097
2013-01-06    1.582048
Freq: D, dtype: float64

Operating with another `Series` or `DataFrame` with a different index or column will align the result with the union of the index or column labels. In addition, pandas automatically broadcasts along the specified dimension and will fill unaligned labels with `np.nan`.

In [64]:
s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)

In [65]:
s

2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64

In [66]:
df.sub(s, axis="index")

Unnamed: 0,A,B,C,D
2013-01-01,,,,
2013-01-02,,,,
2013-01-03,-2.823859,-0.786981,-2.083522,4.0
2013-01-04,-2.191108,-3.871928,-2.382974,2.0
2013-01-05,-6.393803,-4.139754,-5.094055,0.0
2013-01-06,,,,


## User defined functions

`DataFrame.agg()` and `DataFrame.transform()` applies a user defined function that reduces or broadcasts its result respectively.

In [67]:
df.agg(lambda x: np.mean(x) * 5.6)

A    -1.709646
B    -1.686346
C     1.167081
D    28.000000
dtype: float64

In [68]:
df.transform(lambda x: x * 101.2)

Unnamed: 0,A,B,C,D
2013-01-01,0.0,0.0,39.36342,506.0
2013-01-02,-66.023723,-131.074115,61.763933,506.0
2013-01-03,-184.574546,21.557568,-109.652448,506.0
2013-01-04,81.859858,-88.239098,62.442987,506.0
2013-01-05,-141.052842,87.056899,-9.518376,506.0
2013-01-06,124.416804,-72.149386,82.145457,506.0


## Value Counts

In [69]:
s = pd.Series(np.random.randint(0, 7, size=10))
s

0    2
1    4
2    2
3    3
4    2
5    0
6    5
7    3
8    5
9    6
dtype: int64

In [70]:
s.value_counts()

2    3
3    2
5    2
4    1
0    1
6    1
Name: count, dtype: int64

## String Methods

`Series` is equipped with a set of string processing methods in the `str` attribute that make it easy to operate on each element of the array, as in the code snippet below. See more at [Vectorized String Methods](https://pandas.pydata.org/docs/user_guide/text.html#text-string-methods)'.

In [71]:
s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
s.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

## Concat

pandas provides various facilities for easily combining together Series` and DataFrame objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.

See the [Merging section](https://pandas.pydata.org/docs/user_guide/merging.html#merging).

Concatenating pandas objects together row-wise with `concat()`:

In [72]:
df = pd.DataFrame(np.random.randn(10, 4))
pieces = [df[:3], df[3:7], df[7:]]
df1 = pd.concat(pieces)

In [73]:
df

Unnamed: 0,0,1,2,3
0,-0.738881,0.417405,1.018696,-1.173046
1,0.800269,-1.478119,-0.009138,-1.764565
2,-0.718518,-1.675768,-0.178054,0.125391
3,-1.082692,-0.711793,-0.872998,-0.558018
4,1.802446,1.512842,0.228687,-0.665983
5,0.955474,0.821356,-1.004355,-0.41521
6,0.162656,-1.871046,-0.233825,0.712478
7,-0.148448,0.827751,-0.584207,-1.013207
8,0.88342,-0.109492,1.247805,0.871534
9,-0.060214,2.142415,-0.106822,-1.209055


In [74]:
df1

Unnamed: 0,0,1,2,3
0,-0.738881,0.417405,1.018696,-1.173046
1,0.800269,-1.478119,-0.009138,-1.764565
2,-0.718518,-1.675768,-0.178054,0.125391
3,-1.082692,-0.711793,-0.872998,-0.558018
4,1.802446,1.512842,0.228687,-0.665983
5,0.955474,0.821356,-1.004355,-0.41521
6,0.162656,-1.871046,-0.233825,0.712478
7,-0.148448,0.827751,-0.584207,-1.013207
8,0.88342,-0.109492,1.247805,0.871534
9,-0.060214,2.142415,-0.106822,-1.209055


Adding a column to a DataFrame is relatively fast. However, adding a row requires a copy, and may be expensive. We recommend passing a pre-built list of records to the DataFrame constructor instead of building a DataFrame by iteratively appending records to it.

## Join
`merge()` enables SQL style join types along specific columns. See the Database style joining section.


In [75]:
left = pd.DataFrame({"key": ["foo", "foo"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "foo"], "rval": [4, 5]})
left

Unnamed: 0,key,lval
0,foo,1
1,foo,2


In [76]:
right

Unnamed: 0,key,rval
0,foo,4
1,foo,5


In [78]:
pd.merge(left, right, on="key")

Unnamed: 0,key,lval,rval
0,foo,1,4
1,foo,1,5
2,foo,2,4
3,foo,2,5


In [79]:
left = pd.DataFrame({"key": ["foo", "bar"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "bar"], "rval": [4, 5]})
# merge on unique keys
pd.merge(left, right, on="key")

Unnamed: 0,key,lval,rval
0,foo,1,4
1,bar,2,5


## Grouping
By “group by” we are referring to a process involving one or more of the following steps
1. Splitting the data into groups based on some criteria
2. Applying a function to each group independ
3. Combining the results into a data structure

[See the Groupin](https://pandas.pydata.org/docs/user_guide/groupby.html#groupby)
upby) section.

In [80]:
df = pd.DataFrame(
    {
        "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
        "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
        "C": np.random.randn(8),
        "D": np.random.randn(8),
    }
)
df

Unnamed: 0,A,B,C,D
0,foo,one,0.397476,1.529295
1,bar,one,-0.55263,0.437685
2,foo,two,0.345544,0.259452
3,bar,three,0.028136,0.071908
4,foo,two,1.068656,-0.95348
5,bar,two,3.485188,-0.917795
6,foo,one,0.1245,1.111037
7,foo,three,0.529225,-0.089033


In [81]:
df.groupby("A")[["C", "D"]].sum()

Unnamed: 0_level_0,C,D
A,Unnamed: 1_level_1,Unnamed: 2_level_1
bar,2.960693,-0.408202
foo,2.465401,1.857271


In [82]:
df.groupby(["A", "B"]).sum() # use multiindex

Unnamed: 0_level_0,Unnamed: 1_level_0,C,D
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,-0.55263,0.437685
bar,three,0.028136,0.071908
bar,two,3.485188,-0.917795
foo,one,0.521976,2.640332
foo,three,0.529225,-0.089033
foo,two,1.414201,-0.694028
