# DataFrame Attributes and Methods

To see all of the available attributes and methods, go to [DataFrame](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and search about 10 percent down the page.

In [None]:
import numpy as np
import pandas as pd
df = pd.read_csv("./iris-data-column.csv", header=0)
df

## Attributes
Attributes can not be modified.

We have already met many of the attributes of *DataFrame* when introducing *ndarray*.

Note that the attributes don't have parentheses after them: they are not functions that can be executed.

In [None]:
df.shape

* The return is a tuple showing the rows and columns of the data
* You can access just the number of columns with `df.shape[1]`

In [None]:
df.T

* This transposes the rows and columns

In [None]:
df.values

* This returns a numpy array
  - Note that it returns the *lowest common denominator* class, i.e. upcasts
  - However, `.to_numpy()` is preferred (see below)

In [None]:
arr1 = df.drop('Class', axis=1, inplace=False).values
print(arr1)
print(arr1.dtype)

In [None]:
print("df.ndim =", df.ndim, "\n")
print("df.dtypes =", df.dtypes, "\n")
print("df.axes =", df.axes, "\n")
print("df.empty = ", df.empty)

## Statistical Methods

In [None]:
help(df.mean)

In [None]:
df.mean(axis=0, numeric_only=True)

This seems a little counter-intuitive
* `axis=0` goes down the column - each row element is done
* `axis=1` goes across the column - each row value is done

In [None]:
df.std(axis=0, numeric_only=True)

`axis` behaves in the same way as for `df.mean`

In [None]:
df.sum(axis=0, numeric_only=True)

In [None]:
df.sum(axis=1, numeric_only=True)

* `.max`, `.min` all work the same way as `.sum`

In [None]:
df.describe()

## Other Methods

In [None]:
df.head(n=2)

* The default for n is 10.

In [None]:
df.tail(n=2)

* The default is 10

In [None]:
df.to_numpy()

* This method is now preferred over using the `.values` attribute
  - In particular, gives better handling of data types and missing values
* Note that it still returns a *lowest common denominator*, upcasting if necessary

In [None]:
arr2 = df.drop('Class', axis=1, inplace=False).to_numpy()
print(arr2)
print(arr2.dtype)

In [None]:
help(df.applymap)

In [None]:
am = df.loc[:, ['Sepal-Length']].applymap(lambda x: x + 10)
am

* `df.loc[:, ['Sepal-Length']]` creates a new DataFrame
  - The `[ ]` around the column name is required, otherwise a *Series* is created and `.applymap` is not defined for *Series*

In [None]:
help(df['Class'].map)

`.map()` is only defined for *Series*, but it is frequently used in combination with *DataFrame* in analytical situations where data need to be encoded.

In [None]:
df['Class']

In [None]:
an = df['Class'].map({\
    'Iris-setosa': 0,\
    'Iris-versicolor': 1,\
    'Iris-virginica': 2})
an

In [None]:
df['Class'] = an
df

# End of Notebook