# Column operations

In [2]:
import pandas as pd
testData = pd.read_csv("./data/startdata.csv",sep = ';', index_col = ['Date and time'], parse_dates = ['Date and time'], dayfirst = True)

## Converting columns

If we want to change the type of a certain column after importing it, we can do so with:
```python
    DataFrame.Year = pd.to_datetime(DataFrame.Year, format='%Y')
```
In this case it's a year being stored as an int being converted to a datetime, but it can be any valid cast.

## Info about column content

If we want to know what the unique values that appear in a column are, we can use **.unique()** on that row. Another way to do this would be to create a Set from that column.

In [10]:
testData['Value3'].unique()

array(['A', 'B', 'C', 'D', 'E', 'F'], dtype=object)

If we just want to know how many unique values are present in the column, we can use **.nunique()**

In [3]:
testData['Value3'].nunique()

6

If for those same unique values, we want to know how many times the appear in that row, we can use **.value_counts()**

In [11]:
testData['Value3'].value_counts()

B    8
A    8
E    8
D    8
C    8
F    8
Name: Value3, dtype: int64

If we pass the normalize=True parameter to .value_counts, it returns a percentage of the total per row.

To find out which index has the highest value for a column, we can use **.idxmax()**. This can also be run on the complete dataset, if all fields are numerical.

In [7]:
testData['Value'].idxmax()

Timestamp('2019-01-03 12:00:00')

Another ways of doing this is:

In [10]:
testData[testData['Value'] == testData['Value'].max()]

Unnamed: 0_level_0,Visibility,Value,Value2,Value3
Date and time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-01-03 12:00:00,Show,98,65,E


This can be adapted for other functions, like min(), mean(), median() etc, which do not have an idx___() function.

Next: [.loc and .iloc](08-Loc_and_iloc.ipynb) | [Content](00-Content.ipynb)