# The pandas DataFrame and Series

Any data file you load with `pandas` will be transformed into a `DataFrame` object.

You usually understand this as a table, with rows and columns.

By asking the type of the object, we can see that it is a `DataFrame` object, coming from the `pandas` library.

Now, if you access one of the columns.

You will get a pandas `Series` object.

It's important to note that the `DataFrame` is a collection of `Series` objects.

Sometimes, you want to work with the `DataFrame` object.

While others want to work with the `Series` object.

But which are the main differences between them?

This tutorial will answer many of the most common errors and questions about the `DataFrame` and `Series` objects.

## Questions

1. Why do you get a `KeyError` when you try to access a column?
2. When to `reset_index`?
3. What is the difference between a `DataFrame` and a `Series`?
4. Why are `dtypes` important?
5. How to access special functions in a `Series` object?
6. How to create a new column from the existing ones?

## Answers

### Dissecting pandas objects

The `DataFrame` and `Series` objects are the core of the `pandas` library.

They aren't just a mere table or a column.

<div>
<img src="src/DataFrame.jpg" width="45%"/>
<img src="src/Series.jpg" width="45%"/>
</div>

They are supercharged with many functions and attributes that allow you to manipulate the data in many ways.

Let's address the most essential concepts in the following sections.

### Pandas index vs column

An essential concept you must understand is the difference between the index and the columns.

Having the `DataFrame` with an `index` that looks like a column, but it's not, is a common source of confusion.

If you want to access the `Player` "column":

You get a `KeyError` because the furthest "column" to the left is not a column but the `index`.

Therefore, you must access the values through the `index` attribute:

Where will you fail the most with this concept?

Data visualization.

### The index is not a column

Let's say you want to create a bar plot with the `Player` column:

Since the `Player` is not a column, but an index, you cannot use it as such.

### Why resetting the index?

Unless, you reset the index to put `Player` as a column:

Now you can use the `Player` column as the x-axis in the plot.

### Pandas accessors to special functions

Since the values of `Birthday` are of `dtype: datetime`:

The `dt` accessor...

... will give you access to specific functions for this data type.

What if this column was not a datetime object, but a string?

You'll get an `AttributeError` because the `dt` accessor is only available for datetime objects.

But you can use the `str` accessor.

To use `string` functions, like `split` to extract the year, month, and day.

Which, additionally, you can turn into a `DataFrame`.

Then, rename the columns to `year`, `month`, and `day`.

Finally, `join` it to the original `DataFrame`.

### Creating new columns

Also, you can operate each `Series` object to create a new one.

Then, add it as a new `Series` to the `DataFrame`.

## Conclusions

1. The `DataFrame` is a collection of `Series` objects.
2. Using a function from the `DataFrame` will operate on all the `Series` objects.
3. The `index` is not a column.
4. Use the `reset_index` function to turn the `index` into a column.
5. Use the `dt` accessor to access datetime functions.
6. Use the `str` accessor to access string functions.
7. Create a new `Series` by operating with the existing ones in the `DataFrame`.