# Data analysis basics with pandas

## Installation

Even though we have Python installed, we still need to install some extra pieces of software! Python is a whole ecosystem of content, where many of the best abilities are from packages/libraries/modules that are made by other people or companies.

In [None]:
%pip install --quiet pandas altair lxml tqdm requests

## Using pandas

To use pandas, we first need to **import it**. Then we can go ahead with reading in our data and analyzing it.

In [None]:
import pandas as pd

# This creates a "dataframe" - the Python version of a spreadsheet
# We're using a CSV right from the internet, but you can also use it on your own computer
df = pd.read_csv("https://raw.githubusercontent.com/jsoma/2024-birn/main/01-pandas/countries.csv")
df

## Saving

When you save your CSV, you always need to include `index=False`. If you don't, you get extra unnamed columns that are irritating to you and your coworkers!

## Graphing

There's a good way to graph and a bad way to graph: the default is [matplotlib](https://matplotlib.org/), which is 100% the worst. A great alternative is [Altair](https://altair-viz.github.io/gallery/index.html), which is more useful and produces prettier (and interactive!) graphics.

In [None]:
df.plot(x='gdp_per_capita', y='life_expectancy', kind='scatter')

In [None]:
import altair as alt

alt.Chart(df).mark_circle(size=50)