## Row operations

In [2]:
import pandas as pd
testData = pd.read_csv("./data/startdata.csv",sep = ';', index_col = ['Date and time'], parse_dates = ['Date and time'], dayfirst = True)

To get a one or more rows from the DataFrame, use the values property with an index:

In [6]:
testData.values[3]

array(['Show', 61, 8, 'D'], dtype=object)

In [7]:
testData.values[3:5]

array([['Show', 61, 8, 'D'],
       ['Show', 26, 1, 'E']], dtype=object)

<BR>You can perform functions on all the fields in a column at once and then do further processing with those results. We will check if the 'visibility' column contains 'hide'. The **str.contains()** function will return a boolean per row.

In [3]:
rowsToHide = testData['Visibility'].str.contains("Hide").fillna(False)
rowsToHide.head(5)

Date and time
2019-01-01 00:00:00    False
2019-01-01 06:00:00     True
2019-01-01 12:00:00    False
2019-01-01 18:00:00    False
2019-01-02 00:00:00    False
Name: Visibility, dtype: bool

As contains() returns a boolean we get a list of True/False variables which is the same size as the original DataFrame.
Because there could be 'incorrect' strings in there, you can add **.fillna(false)** after the closing bracket of the contains() to set those to false in the returned results, or any other acceptable value, depending on what type the function returns.

In a case like this, where you get a Series of booleans returned, you can reverse this with ~(name)
```python
rowsToShow = ~(rowsToHide)
```

These Series can be used with a DataFrame to only show/use those that are true by adding it in a sepparate set of square brackets. Note that here we do not use quotes inside the square brackets, as it is not a named column we are using.

In [93]:
testData[rowsToHide]

Unnamed: 0_level_0,Visibility,Value,Value2,Value3
Date and time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-01-01 06:00:00,Hide,73,37,B
2019-01-04 12:00:00,Hide,95,39,C
2019-01-06 12:00:00,Hide,83,95,E
2019-01-08 18:00:00,Hide,91,61,B
2019-01-11 00:00:00,Hide,1,74,E


<br>You can also use them to select one or more columns:

In [7]:
testData[['Value', 'Value2']][rowsToHide]

Unnamed: 0_level_0,Value,Value2
Date and time,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-01-01 06:00:00,73,37
2019-01-04 12:00:00,95,39
2019-01-06 12:00:00,83,95
2019-01-08 18:00:00,91,61
2019-01-11 00:00:00,1,74


<br>Above we made a variable rowsToHide, but if we only need this once, we can also do this inline:

In [95]:
testData['Value'][testData['Visibility'].str.contains("Hide")]

Date and time
2019-01-01 06:00:00    73
2019-01-04 12:00:00    95
2019-01-06 12:00:00    83
2019-01-08 18:00:00    91
2019-01-11 00:00:00     1
Name: Value, dtype: int64

We can use these Series with boolean values in combination with **.loc** to set values in the selected rows to a different value, without changing the others:

In [9]:
testData.loc[rowsToHide, 'Value'] = 0
testData.head(5)

Unnamed: 0_level_0,Visibility,Value,Value2,Value3
Date and time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-01-01 00:00:00,Show,30,42,A
2019-01-01 06:00:00,Hide,0,37,B
2019-01-01 12:00:00,Show,44,51,C
2019-01-01 18:00:00,Show,61,8,D
2019-01-02 00:00:00,Show,26,1,E


<br>If we want to know what the unique values that appear in a column are, we can use **.unique()** on that row:

In [10]:
testData['Value3'].unique()

array(['A', 'B', 'C', 'D', 'E', 'F'], dtype=object)

If we just want to know how many unique values are present in the column, we can use **.nunique()**

In [3]:
testData['Value3'].nunique()

6

We can also create a set from the column to get the unique entry's

In [12]:
set(testData['Value3'])

{'A', 'B', 'C', 'D', 'E', 'F'}

<br>If for those same unique values, we want to know how many times the appear in that row, we can use **.value_counts()**

In [11]:
testData['Value3'].value_counts()

B    8
A    8
E    8
D    8
C    8
F    8
Name: Value3, dtype: int64

Next: [Plotting](07-Plotting.ipynb)