# cuDF

Now let's move onto some more high level APIs, starting with `cuDF`. Similar to `pandas` the `cudf` library is a dataframe package for working with tabular datasets.

Data is loaded onto the GPU and all operations are performed with GPU compute, but the API of `cudf` should feel very familiar to `pandas` users.

In [None]:
import cudf

In this tutorial we have some data stored in `data/`. Most of this data is too small to really benefit from GPU acceleration, but let's explore it anyway.

In [None]:
pageviews = cudf.read_csv("data/pageviews_small.csv", sep=" ")
pageviews

This `pageviews.csv` file contains just over `1M` records of pageview counts from Wikipedia in various languages.

Let's rename the columns and drop the unused `x` column.

In [None]:
pageviews.columns = ['project', 'page', 'requests', 'x']

pageviews = pageviews.drop('x', axis=1)

pageviews

Next let's count how many english record are in this dataset.

In [None]:
pageviews[pageviews.project == 'en'].count()

Then let's perform a groupby where we count all of the pages by language.

In [None]:
grouped_pageviews = pageviews.groupby('project').count().reset_index()
grouped_pageviews

And finally let's have a look at the results for English, French, Chinese and Polish specificallty.

In [None]:
grouped_pageviews[grouped_pageviews.project.isin(['en', 'fr', 'zh', 'pl'])]

If you have used `pandas` before then all fo this syntax should be very familiar to you. In the same way that `cupy` implements a large portion of the `numpy` API, `cudf` implements a large portion of the `pandas` API.

The only difference is that all of our filtering and groupby operations happened on the GPU instead of the CPU giving much better performance.