# Altair Intro: Data

In this session, we will take a closer look at the ways of specifying data in Altair. Data is often passed as the first argument to the constructor of `alt.Chart` object.  Altair is smart enough to work with many different types of data. We will look at examples of 3 most common data types: Pandas dataframe, `alt.Data` object and url pointing to data in either json or csv format.

## Pandas DataFrame

Pandas is a very powerful data processing package. Its `DataFrame` class has become the de facto data structure for representing data. It is also the recommended way of storing data. Let's look at a simple example.

In [1]:
import altair as alt
import pandas as pd

source = pd.DataFrame({"category": [1, 2, 3, 4, 5, 6], "value": [4, 6, 10, 3, 7, 8],
                       "quality": ["standard", "good", "excellent", "standard", "good", "excellent"]})
source

Unnamed: 0,category,value,quality
0,1,4,standard
1,2,6,good
2,3,10,excellent
3,4,3,standard
4,5,7,good
5,6,8,excellent


In this example, we have created a very simple dataframe directly from a dictionary. Each entry of the dictionary corresponds to a column in our table. This is our toy dataset that will be used over and over again. In practice, it is more likely that data is loaded from files using `pd.read_csv` for example.

In [27]:
chart = alt.Chart(source)
chart.mark_point().encode(x="category", y="value")

Once that data is ready, we can simply create an Altair chart object by `alt.Chart(data)`. Demonstrate that data is loaded correctly, we visualized part of the data using a scatter plot. We will talk about mark and encoding in another session.  For now, we will claim victory since the plot above is as expected.

## Altair Data

Alternatively, we can construct an Altair `Data` object.

In [30]:
data = alt.Data(values=[
    {"category": 1, "value": 4, "quality": "standard"},
    {"category": 2, "value": 6, "quality": "good"},
    {"category": 3, "value": 10, "quality": "excellent"},
    {"category": 4, "value": 3, "quality": "standard"},
    {"category": 5, "value": 7, "quality": "good"},
    {"category": 6, "value": 8, "quality": "excellent"},
])

chart = alt.Chart(data)
chart.mark_point().encode(x="category:Q", y="value:Q")

Note that with Altair's `Data` object, we are using exactly the same data as before. However, data is specified row-wise in json style. The rest is the same, and the output scatter plot looks as expected again.

## URL

Lastly, let's try to specify data using url.

In [40]:
from vega_datasets import data
url = data.cars.url
print(url)

chart = alt.Chart(url)
chart.mark_point().encode(x="Horsepower:Q", y="Weight_in_lbs:Q")

https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/cars.json


In this example, we will use the url that points to the cars dataset. We simply pass the url as a string to `alt.Chart`.  The output plot once again looks as expected.

## Summary

In Altair, data can be specified as the first argument of the constructor of the `alt.Chart` class.  Altair recognizes many different forms of data.  The most popular forms are pandas DataFrame, altair Data object and using a url.