<a href="https://colab.research.google.com/github/mehrnazh/PythonVisualization/blob/main/Altair_in_depth.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import altair as alt
import pandas as pd
import numpy as np

# Specifying Data

The basic data model used by Altair is tabular data, similar to a spreadsheet or database table. Individual datasets are assumed to contain a collection of records (rows), which may contain any number of named data fields (columns).

In [None]:
data = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E'],
                     'y': [5, 3, 6, 7, 2]})
data

Unnamed: 0,x,y
0,A,5
1,B,3
2,C,6
3,D,7
4,E,2


In [None]:
base = alt.Chart(data).mark_bar()

marked = base.mark_bar()

encoded = marked.encode(
    x='x',
    y='y',
)

encoded

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [None]:
alt.Chart(data).mark_bar().encode(
    x='x',
    y='y',
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


# Encoding Data Types

The details of any mapping depend on the type of the data. Altair recognizes five main data types:

| Data Type     | Shorthand Code | Description                          |
|---------------|----------------|--------------------------------------|
| quantitative  | Q              | a continuous real-valued quantity    |
| ordinal       | O              | a discrete ordered quantity          |
| nominal       | N              | a discrete unordered category        |
| temporal      | T              | a time or date value                 |
| geojson       | G              | a geographic shape                   |

For data specified as a DataFrame, Altair can automatically determine the correct data type for each encoding and creates appropriate scales and legends to represent the data.

If types are not specified for data input as a DataFrame, Altair defaults to `quantitative` for any numeric data, `temporal` for date/time data, and `nominal` for string data. However, be aware that these defaults are not always the correct choice!


In [None]:
from vega_datasets import data
url = data.cars.url

alt.Chart(url).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q'
)

## Encoding


### Channels

Channels are either about:


* Position
* Mark property
* text and tooltips
* hyperlink
* detail
* order
* facet

lets take a deeper dive:
 ### position


| Channel    | Altair Class | Description                         |
|------------|--------------|-------------------------------------|
| x          | X            | The x-axis value                    |
| y          | Y            | The y-axis value                    |
| x2         | X2           | Second x value for ranges           |
| y2         | Y2           | Second y value for ranges           |
| longitude  | Longitude    | Longitude for geo charts            |
| latitude   | Latitude     | Latitude for geo charts             |
| longitude2 | Longitude2   | Second longitude value for ranges   |
| latitude2  | Latitude2    | Second latitude value for ranges    |
| xError     | XError       | The x-axis error value              |
| yError     | YError       | The y-axis error value              |
| xError2    | XError2      | The second x-axis error value       |
| yError2    | YError2      | The second y-axis error value       |
| xOffset    | XOffset      | Offset to the x position            |
| yOffset    | YOffset      | Offset to the y position            |
| theta      | Theta        | The start arc angle                 |
| theta2     | Theta2       | The end arc angle (radian)          |


In [None]:
source = pd.DataFrame([
    {"task": "A", "start": 1, "end": 3},
    {"task": "B", "start": 3, "end": 8},
    {"task": "C", "start": 8, "end": 10}
])

alt.Chart(source).mark_bar().encode(
    x='start',
    x2='end',
    y='task'
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)




### Mark Property


| Channel       | Altair Class   | Description                            |
|---------------|----------------|----------------------------------------|
| angle         | Angle          | The angle of the mark                  |
| color         | Color          | The color of the mark                  |
| fill          | Fill           | The fill for the mark                  |
| fillOpacity   | FillOpacity    | The opacity of the mark’s fill         |
| opacity       | Opacity        | The opacity of the mark                |
| radius        | Radius         | The radius of the mark                 |
| shape         | Shape          | The shape of the mark                  |
| size          | Size           | The size of the mark                   |
| stroke        | Stroke         | The stroke of the mark                 |
| strokeDash    | StrokeDash     | The stroke dash style                  |
| strokeOpacity | StrokeOpacity  | The opacity of the line                |
| strokeWidth   | StrokeWidth    | The width of the line                  |




In [None]:
from vega_datasets import data

source = data.stocks()

alt.Chart(source).mark_line().encode(
    x='date:T',
    y='price:Q',
    #color='symbol:N',
    strokeDash='symbol:N'
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


### Text and Tooltip

In [None]:
source = pd.DataFrame({
    'x': [1, 3, 5, 7, 9],
    'y': [1, 3, 5, 7, 9],
    'label': ['A', 'B', 'C', 'D', 'E']
})

points = alt.Chart(source).mark_point().encode(
    x='x:Q',
    y='y:Q',
    tooltip =['x','y'],
)

text = points.mark_text(
    align='left',
    baseline='middle',
    dx=7
).encode(
    text='label',
)

points + text

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


### Hyperlink

In [None]:
source = data.cars()

alt.Chart(source).transform_calculate(
    url='https://www.google.com/search?q=' + alt.datum.Name
).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N',
    href='url:N',
    tooltip=['Name:N', 'url:N']
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


 ### Detail

Grouping data is an important operation in data visualization. For line and area marks, mapping an unaggregated data field to any non-position channel will group the lines and stacked areas by that field. For aggregated plots, all unaggregated fields encoded are used as grouping fields in the aggregation (similar to fields in GROUP BY in SQL).

The `detail` channel specifies an additional grouping field (or fields) for grouping data without mapping the field(s) to any visual properties.

For example here is a line chart showing stock prices of 5 tech companies over time. We map the `symbol` variable to `detail` to use them to group lines.



In [None]:
source = data.stocks()
alt.Chart(source).mark_line().encode(
    x="date:T",
    y="price:Q",
    detail="symbol:N"
).interactive()

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


### order
The `order` option and `Order` channel can sort how marks are drawn on the chart.

For stacked marks, this controls the order of components of the stack. Here, the elements of each bar are sorted alphabetically by the name of the nominal data in the color channel.

The order can be reversed by changing the sort option to `descending`.




In [None]:
barley = data.barley()

alt.Chart(barley).mark_area().encode(
    x='variety:N',
    y='sum(yield):Q',
    color='site:N',
    order=alt.Order("site", sort="ascending")
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


For line marks, the Order channel encodes the order in which data points are connected. This can be useful for creating a scatter plot that draws lines between the dots using a different field than the x and y axes.

In [None]:
# Parameters
num_trials = 5
points_per_trial = 50

# Simulate data
np.random.seed(42)
data = {
    'Trial': [],
    'x': [],
    'y': [],
    'sequence': []
}

for trial in range(1, num_trials + 1):
    x = np.cumsum(np.random.randn(points_per_trial))
    y = np.cumsum(np.random.randn(points_per_trial))
    sequence = np.arange(1, points_per_trial + 1) + (trial - 1) * points_per_trial
    data['Trial'].extend([trial] * points_per_trial)
    data['x'].extend(x)
    data['y'].extend(y)
    data['sequence'].extend(sequence)

df = pd.DataFrame(data)

# Create a continuous motion trajectory plot
line = alt.Chart(df).mark_line().encode(
    x=alt.X('x:Q', title='X Coordinate'),
    y=alt.Y('y:Q', title='Y Coordinate'),
    order='sequence:Q',  # ensures the points are connected in the right order
    color=alt.value('blue')  # set a fixed color for the line
)

# Add more properties
line = line.properties(
    title='Continuous Motion Trajectory in Morris Water Maze',
    width=600,
    height=400
).interactive()  # make the plot interactive

# Mark the start points as green
start_points = alt.Chart(df).mark_point(color='green', size=100).encode(
    x='x:Q',
    y='y:Q'
).transform_filter(
    alt.datum.sequence == 1
)

# Mark the end points as red
end_points = alt.Chart(df).mark_point(color='red', size=100).encode(
    x='x:Q',
    y='y:Q'
).transform_filter(
    alt.datum.sequence == points_per_trial
)

# Combine the line, start points, and end points
final_chart = (line + start_points + end_points).properties(
    title='Continuous Motion Trajectory in Morris Water Maze',
    width=600,
    height=400
).interactive()  # make the plot interactive

# Display the plot
final_chart

In [None]:
final_chart.save('chart.html', embed_options={'renderer':'svg'})


### Facet


| Channel | Altair Class | Description                            |
|---------|--------------|----------------------------------------|
| column  | Column        | The column of a faceted plot           |
| row     | Row           | The row of a faceted plot              |
| facet   | Facet         | The row and/or column of a general faceted plot |


Becker’s Barley Faceted Plot:

The example demonstrates the faceted charts created by Richard Becker, William Cleveland and others in the 1990s. Using the visualization technique where each row is a different site (i.e. the chart is faceted by site), they identified an anomaly in a widely used agriculatural dataset, where the “Morris” site accidentally had the years 1931 and 1932 swapped. They named this “The Morris Mistake.”.

In [None]:
from vega_datasets import data

source = data.barley()

alt.Chart(source, title="The Morris Mistake").mark_point().encode(
    alt.X('yield:Q', title="Barley Yield (bushels/acre)", scale=alt.Scale(zero=False)),  # Remove grid, control zero point with scale if needed
    alt.Y('variety:N', title="", sort="-x", axis=alt.Axis()),  # Change axis=True to axis=alt.Axis()
    alt.Color('year:N', title="Year"),
    alt.Row('site:N',title="", sort={'field':'yield', 'op':'sum', 'order':'descending'})
).properties(
    height=alt.Step(20)
).configure_view(stroke="transparent")

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [None]:
source = data.population.url

alt.Chart(source).mark_area().encode(
    x='age:O',
    y=alt.Y('sum(people):Q',title='Population'),
    facet=alt.Facet('year:O',columns=5)).properties(
    title='US Age Distribution By Year',
    width=90,
    height=80
)

Some encoding channels allow for additional options to be expressed. These can control things like axis properties, scale properties, headers and titles, binning parameters, aggregation, sorting, and many more.

This Link refer to the channels introduced in Channels and show the accepted options for these channels.
[Channel Options](https://altair-viz.github.io/user_guide/encodings/channel_options.html)

# Marks

We saw in Encodings that the `encode()` method is used to map columns to visual attributes of the plot. The mark property is what specifies how exactly those attributes should be represented on the plot.

Altair supports the following primitive mark types:

### Table 1: Altair Marks

| Mark       | Method           | Description                                |
|------------|------------------|--------------------------------------------|
| **Arc**    | `mark_arc()`      | A pie chart.                               |
| **Area**   | `mark_area()`     | A filled area plot.                        |
| **Bar**    | `mark_bar()`      | A bar plot.                                |
| **Circle** | `mark_circle()`   | A scatter plot with filled circles.        |
| **Geoshape** | `mark_geoshape()` | Visualization containing spatial data      |
| **Image**  | `mark_image()`    | A scatter plot with image markers.         |
| **Line**   | `mark_line()`     | A line plot.                               |
| **Point**  | `mark_point()`    | A scatter plot with configurable point shapes. |
| **Rect**   | `mark_rect()`     | A filled rectangle, used for heatmaps.     |
| **Rule**   | `mark_rule()`     | A vertical or horizontal line spanning the axis. |
| **Square** | `mark_square()`   | A scatter plot with filled squares.        |
| **Text**   | `mark_text()`     | A scatter plot with points represented by text. |
| **Tick**   | `mark_tick()`     | A vertical or horizontal tick mark.        |
| **Trail**  | `mark_trail()`    | A line with variable widths.               |

---

### Table 2: Altair Composite Marks

| Mark Name  | Method           | Description                               |
|------------|------------------|-------------------------------------------|
| **Box Plot**    | `mark_boxplot()`  | A box plot.                                |
| **Error Band**  | `mark_errorband()`| A continuous band around a line.           |
| **Error Bar**   | `mark_errorbar()` | An error bar around a point.               |

---

### Additional Information

In Altair, marks can be most conveniently specified by the `mark_*` methods of the `Chart` object (e.g., `mark_bar()`), which take optional keyword arguments to configure the look of the marks.

Check out [Mark properties](https://altair-viz.github.io/user_guide/marks/index.html) document

# Data Transformation



In [None]:
data = pd.DataFrame({'x': np.arange(0, 10, 0.1)})

alt.Chart(data).transform_calculate(
    y='sin(datum.x)'
).mark_line().encode(
    x='x:Q',
    y='y:Q',
)

`transform_calculate(y='sin(datum.x)')`
This method allows you to create new calculated fields within the chart. It takes a string-based expression and evaluates it for each row in the dataset.

`y='sin(datum.x)'`: This specific calculation creates a new field `y` in the data, where each y value is computed as the sine of the corresponding `x` value `(sin(datum.x))`.

`datum.x`: Here, datum refers to each individual data point in the dataset. `datum.x` accesses the x value of that point.
`sin(datum.x)`: This calculates the sine of the `x` value using JavaScript's Math.sin function behind the scenes.

It is often necessary to transform or filter data in the process of visualizing it. In Altair you can do this one of two ways:

1. Before the chart definition, using standard pandas data transformations.

2. Within the chart definition, using Vega-Lite’s data transformation tools.

In most cases, we suggest that you use the first approach, because it is more straightforward to those who are familiar with data manipulation in Python, and because the pandas package offers much more flexibility than Vega-Lite in available data manipulations.

The second approach becomes useful when the data source is not a dataframe, but, for example, a URL pointer to a JSON or CSV file. It can also be useful in a compound chart where different views of the dataset require different transformations.

This second approach – specifying data transformations within the chart specification itself – can be accomplished using the transform_* methods of top-level objects:



---

### Altair Transform Methods

| **Transform**      | **Method**              | **Description**                                           |
|--------------------|-------------------------|-----------------------------------------------------------|
| **Aggregate**      | `transform_aggregate()` | Create a new data column by aggregating an existing column. |
| **Bin**            | `transform_bin()`       | Create a new data column by binning an existing column.   |
| **Calculate**      | `transform_calculate()` | Create a new data column using an arithmetic calculation on an existing column. |
| **Density**        | `transform_density()`   | Create a new data column with the kernel density estimate of the input. |
| **Extent**         | `transform_extent()`    | Find the extent of a field and store the result in a parameter. |
| **Filter**         | `transform_filter()`    | Select a subset of data based on a condition.            |
| **Flatten**        | `transform_flatten()`   | Flatten array data into columns.                         |
| **Fold**           | `transform_fold()`      | Convert wide-form data into long-form data (opposite of pivot). |
| **Impute**         | `transform_impute()`    | Impute missing data.                                    |
| **Join Aggregate** | `transform_joinaggregate()` | Aggregate transform joined to original data.         |
| **LOESS**          | `transform_loess()`     | Create a new column with LOESS smoothing of data.         |
| **Lookup**         | `transform_lookup()`    | One-sided join of two datasets based on a lookup key.     |
| **Pivot**          | `transform_pivot()`     | Convert long-form data into wide-form data (opposite of fold). |
| **Quantile**       | `transform_quantile()`  | Compute empirical quantiles of a dataset.                |
| **Regression**     | `transform_regression()`| Fit a regression model to a dataset.                     |
| **Sample**         | `transform_sample()`    | Random sub-sample of the rows in the dataset.            |
| **Stack**          | `transform_stack()`     | Compute stacked version of values.                       |
| **TimeUnit**       | `transform_timeunit()`  | Discretize/group a date by a time unit (day, month, year, etc.) |
| **Window**         | `transform_window()`    | Compute a windowed aggregation.                          |

---

In [None]:
cars = data.cars.url
chart = alt.Chart(cars).mark_bar().encode(
    y='Cylinders:O',
    x='mean_acc:Q'
).transform_aggregate(
    mean_acc='mean(Acceleration)',
    groupby=["Cylinders"]
)
chart

The Altair shorthand string:

 ```...
x='mean(Acceleration):Q',
 ...
```
is made available for convenience, and is equivalent to the longer form:

``` ...
x=alt.X(field='Acceleration', aggregate='mean', type='quantitative'),
 ...
```


### Calculate

In [None]:
data = pd.DataFrame({'t': range(101)})

alt.Chart(data).mark_line().encode(
    x='x:Q',
    y='y:Q',
    order='t:Q'
).transform_calculate(
    x='cos(datum.t * PI / 50)',
    y='sin(datum.t * PI / 25)'
)

### regression

In [None]:
np.random.seed(42)
x = np.linspace(0, 10)
y = x - 5 + np.random.randn(len(x))

df = pd.DataFrame({'x': x, 'y': y})

chart = alt.Chart(df).mark_point().encode(
    x='x',
    y='y'
)

chart + chart.transform_regression('x', 'y').mark_line()

### Sample

In [None]:
import altair as alt
from vega_datasets import data

source = data.cars.url

chart = alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
).properties(
    width=200,
    height=200
)

chart | chart.transform_sample(100)

### Other types of data transformation
[Link](https://altair-viz.github.io/user_guide/transform/index.html)

# Interactive Charts

One of the unique features of Altair, inherited from Vega-Lite, is a declarative grammar of not just visualization, but also interaction. This is both convenient and powerful, as we will see in this section. There are three core concepts of this grammar:

Parameters are the basic building blocks in the grammar of interaction. They can either be simple variables or more complex selections that map user input (e.g., mouse clicks and drags) to data queries.

Conditions and filters can respond to changes in parameter values and update chart elements based on that input.

Widgets and other chart input elements can bind to parameters so that charts can be manipulated via drop-down menus, radio buttons, sliders, legends, etc.

### Selections: Capturing Chart Interactions
Selection parameters define data queries that are driven by interactive manipulation of the chart by the user (e.g., via mouse clicks or drags). There are two types of selections: `selection_interval()` and `selection_single()`.

Here we will create a simple chart and then add an selection interval to it. We could create a selection interval via `add_selection(select="interval")`, but it is more convenient to use the shorter selection_interval.

Here is a simple scatter-plot created from the cars dataset:




In [None]:
cars = data.cars.url

alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
)


In [None]:
brush = alt.selection_interval()


In [None]:
alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
).add_selection(
    brush
)

### Conditional Encodings
The example above is neat, but the selection interval doesn’t actually do anything yet. To make the chart respond to this selection, we need to reference the selection in within the chart specification. Here, we will use the `condition()` function to create a conditional color encoding: we’ll tie the color to the `"Origin"` column for points in the selection, and set the color to `"lightgray"` for points outside the selection:


In [None]:
alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))
).add_selection(
    brush
)

This approach becomes even more powerful when the selection behavior is tied across multiple views of the data within a compound chart. For example, here we create a `chart` object using the same code as above, and horizontally concatenate two versions of this chart: one with the x-encoding tied to `"Horsepower"`, and one with the x-encoding tied to `"Acceleration"`


In [None]:
chart = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))
).properties(
    width=250,
    height=250
).add_selection(
    brush
)

chart | chart.encode(x='Acceleration:Q')

Because both copies of the chart reference the same selection object, the renderer ties the selections together across panels, leading to a dynamic display that helps you gain insight into the relationships within the dataset.

Each selection type has attributes through which its behavior can be customized; for example we might wish for our brush to be tied only to the `"x"`encoding to emphasize that feature in the data. We can modify the brush definition, and leave the rest of the code unchanged:

In [None]:
brush = alt.selection_interval(encodings=['x'])

chart = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))
).properties(
    width=250,
    height=250
).add_selection(
    brush
)

chart | chart.encode(x='Acceleration:Q')

As you might have noticed, the selected points are sometimes obscured by some of the unselected points. To bring the selected points to the foreground, we can change the order in which they are laid out via the following encoding: `order=alt.condition(hover, alt.value(1), alt.value(0))`. You can see and example of this in the Selection zorder gallery example.



### Filtering Data
Using a selection parameter to filter data works in much the same way as using it within `condition`, For example, in `transform_filter(brush)`, we are again using the selection parameter `brush` as a predicate. Data points which evaluate to `True` (i.e., data points which lie within the selection) are kept, and data points which evaluate to `False` are filtered out.

It is not possible to both select and filter in the same chart, so typically this functionality will be used when at least two sub-charts are present. In the following example, we attach the selection parameter to the upper chart, and then filter data in the lower chart based on the selection in the upper chart. You can explore how the counts change in the bar chart depending on the size and position of the selection in the scatter plot.



In [None]:
brush = alt.selection_interval()

points = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
).add_selection(
    brush
)

bars = alt.Chart(cars).mark_bar().encode(
    x='count()',
    y='Origin:N',
    color='Origin:N'
).transform_filter(
    brush
)

points & bars

### Selection types
An interval selection allows you to select chart elements by clicking and dragging. You can create such a selection using the `selection_interval()` function:



In [None]:
def make_example(selector):
    cars = data.cars.url

    return alt.Chart(cars).mark_rect().encode(
        x="Cylinders:O",
        y="Origin:N",
        color=alt.condition(selector, 'count()', alt.value('lightgray'))
    ).properties(
        width=300,
        height=180
    ).add_selection(
        selector
    )

In [None]:
interval = alt.selection_interval()
make_example(interval)

In [None]:
interval_x = alt.selection_interval(encodings=['x'])
make_example(interval_x)

In [None]:
point = alt.selection_single()
make_example(point)

In [None]:
point_nearest = alt.selection_single(on='pointerover', nearest=True)
make_example(point_nearest)

### Selection Targets
For any but the simplest selections, the user needs to think about exactly what is targeted by the selection, and this can be controlled with either the `fields` or `encodings` arguments. These control what data properties are used to determine which points are part of the selection.

For example, here we create a small chart that acts as an interactive legend, by targeting the Origin field using `fields=['Origin']`. Clicking on points in the upper-right plot (the legend) will propagate a selection for all points with a matching `Origin`.




In [None]:
selection = alt.selection_multi(fields=['Origin'])
color = alt.condition(
    selection,
    alt.Color('Origin:N',legend=None),
    alt.value('lightgray')
)

scatter = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=color,
    tooltip='Name:N'
)

legend = alt.Chart(cars).mark_point().encode(
    alt.Y('Origin:N'),
    color=color
).add_selection(
    selection
)

scatter | legend

# Bindings & Widgets

In [None]:
selection = alt.selection_multi(fields=['Origin', 'Cylinders'])
color = alt.condition(
    selection,
    alt.Color('Origin:N',legend=None),
    alt.value('lightgray')
)

scatter = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=color,
    tooltip='Name:N'
)

legend = alt.Chart(cars).mark_rect().encode(
    alt.Y('Origin:N'),
    x='Cylinders:O',
    color=color
).add_(
    selection
)

scatter | legend

## Data-Driven Lookups

In [None]:
cars = data.cars()

input_dropdown = alt.binding_select(options=cars['Origin'].unique().tolist(), name='Region')
selection = alt.selection_single(fields=['Origin'], bind=input_dropdown, init={'Region': 'USA'})
color = alt.condition(
    selection,
    alt.Color('Origin:N'),
    alt.value('lightgray')
)

alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=color,
).add_selection(
    selection
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


## Data-Driven Comparisons

In [None]:
# Generate data
rand = np.random.RandomState(42)
df = pd.DataFrame({
    'xval': range(100),
    'yval': rand.randn(100).cumsum()
})

# Define a range slider for continuous selection
slider = alt.binding_range(min=0, max=100, step=1, name='Cutoff ')
selector = alt.selection_single(fields=['xval'], bind=slider, name='SelectorName', init={'xval': 50})

# Create the chart with conditional coloring based on the slider value
chart = alt.Chart(df).mark_point().encode(
    x='xval:Q',
    y='yval:Q',
    color=alt.condition(
        alt.datum.xval < selector.xval,
        alt.value('red'),
        alt.value('blue')
    )
).add_selection(
    selector
)

chart

# Saving

In [None]:
chart.save('chart.html')

# Overview

In [None]:
data = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'D'],
    'Value': [10, 20, 30, 40]})

selection = alt.selection_single(
    fields=['Category'],  # Field to base the selection on
    bind='legend')  # Bind selection to legend



(alt.Chart(data)
  .mark_bar(# Specifying chart type as bar chart
            size=50,  # Width of bars
            color='steelblue',  # Default color of bars
            opacity=0.7)  # Transparency of bars
  .encode(
      x=alt.X('Category:N',  # 'N' for nominal (categorical) data type
              title='Category',  # Title of the x-axis
              axis=alt.Axis(labelAngle=0)),  # Control label angle
      y=alt.Y('Value:Q',  # 'Q' for quantitative (numerical) data type
              title='Value',
              scale=alt.Scale(domain=[0, 50])),  # Custom y-axis scale
      color=alt.Color('Category:N',  # Color encoding based on category
                      legend=alt.Legend(title="Category Legend")),  # Legend customization
      tooltip=[alt.Tooltip('Category:N', title='Category'),  # Tooltip for interactivity
               alt.Tooltip('Value:Q', title='Value')])
  .properties(
      width=600,  # Width of the chart
      height=400,  # Height of the chart
      title=alt.TitleParams(
          text="Sample Bar Chart",  # Main title
          subtitle="Category vs Value",  # Subtitle
          anchor='middle',  # Positioning of the title
          fontSize=18),  # Font size of the title
      description="A bar chart showing the relationship between categories and their corresponding values.")
  .configure_axis(
      grid=True,  # Enable gridlines
      labelFontSize=12,  # Font size of axis labels
      titleFontSize=14,  # Font size of axis titles
      gridColor='gray',  # Color of gridlines
      gridOpacity=0.3)  # Transparency of gridlines
  .configure_legend(
      titleFontSize=14,  # Font size of legend title
      labelFontSize=12,  # Font size of legend labels
      symbolSize=100,  # Size of legend symbols
      symbolType='circle',  # Shape of legend symbols
      orient='right')  # Position of the legend
  .add_selection(selection)
  .transform_filter(selection))







  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [None]:
# Sample DataFrame
data = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'D'],
    'Value': [10, 20, 30, 40]
})

# Base Chart Creation
base_chart = alt.Chart(data)

base_chart


  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


SchemaValidationError: Invalid specification

        altair.vegalite.v4.api.Chart, validating 'required'

        'mark' is a required property
        

alt.Chart(...)

In [None]:
# Chart Type (Mark)
chart_with_mark = base_chart.mark_bar(  # Specifying chart type as bar chart
    size=50,  # Width of bars
    color='steelblue',  # Default color of bars
    opacity=0.7  # Transparency of bars
)
chart_with_mark

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [None]:
# Encoding Data
encoded_chart = chart_with_mark.encode(
    x=alt.X('Category:N',  # 'N' for nominal (categorical) data type
            title='Category',  # Title of the x-axis
            axis=alt.Axis(labelAngle=0)  # Control label angle
            ),
    y=alt.Y('Value:Q',  # 'Q' for quantitative (numerical) data type
            title='Value',
            scale=alt.Scale(domain=[0, 50])  # Custom y-axis scale
            ),
    color=alt.Color('Category:N',  # Color encoding based on category
                    legend=alt.Legend(title="Category Legend")  # Legend customization
                    ),
    tooltip=[alt.Tooltip('Category:N', title='Category'),  # Tooltip for interactivity
             alt.Tooltip('Value:Q', title='Value')]
)

encoded_chart


  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [None]:

# Adding Titles and Descriptions
final_chart = chart_with_mark.properties(
    width=600,  # Width of the chart
    height=400,  # Height of the chart
    title=alt.TitleParams(
        text="Sample Bar Chart",  # Main title
        subtitle="Category vs Value",  # Subtitle
        anchor='middle',  # Positioning of the title
        fontSize=18  # Font size of the title
    ),
    description="A bar chart showing the relationship between categories and their corresponding values."
)
final_chart

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [None]:

# Customizing Axis Labels and Gridlines
final_chart = final_chart.configure_axis(
    grid=True,  # Enable gridlines
    labelFontSize=12,  # Font size of axis labels
    titleFontSize=14,  # Font size of axis titles
    gridColor='gray',  # Color of gridlines
    gridOpacity=0.3  # Transparency of gridlines
)
final_chart

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [None]:

# Customizing the Legend
final_chart = final_chart.configure_legend(
    titleFontSize=14,  # Font size of legend title
    labelFontSize=12,  # Font size of legend labels
    symbolSize=100,  # Size of legend symbols
    symbolType='circle',  # Shape of legend symbols
    orient='right',  # Position of the legend
)
final_chart

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [None]:

# Adding Interactivity: Selection
selection = alt.selection_single(
    fields=['Category'],  # Field to base the selection on
    bind='legend',  # Bind selection to legend
    #empty='none',  # Behavior when no selection is made
)

# Applying Selection to Chart
interactive_chart = final_chart.add_selection(
    selection
).transform_filter(
    selection
)

# Display the chart
interactive_chart.display()


  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
