# Agenda, week 2

1. Q&A
2. dtypes
3. `NaN` (not a number)
4. data frames (2D data structures)
5. Adding and removing data in our data frames
6. Useful methods and attributes
7. Querying with boolean indexes
8. Querying with `.loc`
9. Read some CSV data from a file

# dtypes



In [3]:
import numpy as np   # this is not strictly necessary, but very useful
import pandas as pd  # this is necessary!

from pandas import Series, DataFrame   # this is convenient

In [4]:
# let's create a series

s = Series([10, 20, 30, 40, 50])

s

0    10
1    20
2    30
3    40
4    50
dtype: int64

# What's a dtype?

Many people, when they're learning Python, wonder why we talk about "lists" rather than "arrays." After all, aren't they the same?

No: Lists are different from arrays in two different ways:

- We can change their size (adding and removing items)
- Each object in a list can be of a different type. In an array, they must all be of the same type.

Fast forward to now, when we're working with NumPy and Pandas, and we're really dealing with arrays. That means we cannot change their size (although Pandas does allow for that, thanks to some magic) and all of the elements have to be of the same type.

In the worlds of NumPy and Pandas, that type is known as the "dtype," the data type.

What options do we have for dtypes? These are (mostly) set by NumPy.

Dtypes

- Integers
    - `np.int8`
    - `np.int16`
    - `np.int32`
    - `np.int64` -- the default!
- Unsigned integers
    - `np.uint8`
    - `np.uint16`
    - `np.uint32`
    - `np.uint64`
- Floats
    - `np.float16`
    - `np.float32`
    - `np.float64` -- the default!
    - `np.float128`
    
# What does this mean?

If you don't specify a dtype when you create a series, Pandas will guess what you want/need:

- If you have only integers, then it'll use `np.int64`
- If you have any floating-point numbers, then it'll use `np.float64`
- If you have strings or other funny Python objects, then it'll use `object` as its type

In [5]:
# we can get the dtype of a series by retrieving the dtype attribute

s.dtype

dtype('int64')

If you don't want to specify `np.int8`, then you can instead say `'int