In [5]:
import pandas as pd
import altair as alt

# Load the licenses dataset (local file path)
licenses_url = "licenses_fall.csv"
licenses = pd.read_csv(licenses_url)

# Inspect the first few rows to understand the structure of the dataset
licenses.head()

# 1. Degree Distribution Plot (Static Plot)
# Group by user (first name + last name) and count licenses per user
degree_count = licenses.groupby(['First Name', 'Last Name']).size().reset_index(name='license_count')

# Filter out users with less than a threshold number of licenses (e.g., keep only users with more than 1 license)
degree_count_filtered = degree_count[degree_count['license_count'] > 1]

# Create the plot
plot1 = alt.Chart(degree_count_filtered).mark_bar().encode(
    x=alt.X('license_count', bin=alt.Bin(maxbins=50), title='License Count (Degree)'),
    y=alt.Y('count()', title='Number of Users'),
    color=alt.value("steelblue")
).properties(
    title='License Count Distribution of Users'
)

# Display plot1
plot1



Plot 1: Degree Distribution Plot (Static Plot)
This bar chart visualizes the distribution of license counts among the users, focusing only on licensed users with more than one license. The x-axis is used to represent the binned numerical values of license counts, categorized into uniform intervals or bins for simplifying the graph in order to show patterns existing in data. The y-axis is used to represent the count of users falling into each bin, thus showing license distribution across individuals. A limited color scheme of steel blue was chosen to retain clarity without distraction. The data was transformed by grouping the records on user names and calculating the total number of licenses held by each user, then filtering out those users who hold only one license; this will ensure that analysis produces nonspurious trends in the data.



In [8]:
import pandas as pd

# Select a sample of the data to avoid MaxRowsError
licenses_sample = licenses.sample(n=500)  # Adjust the number based on your needs

# Ensure 'License Number' is numeric, coercing errors to NaN
licenses_sample['License Number'] = pd.to_numeric(licenses_sample['License Number'], errors='coerce')

# Drop rows with NaN in 'License Number' (if coercion fails)
licenses_sample = licenses_sample.dropna(subset=['License Number'])

# Convert 'License Number' to integer (optional, depending on the data)
licenses_sample['License Number'] = licenses_sample['License Number'].astype(int)

# Add a binned version of 'License Number' for systematic grouping
licenses_sample['License Number (Binned)'] = pd.cut(
    licenses_sample['License Number'],
    bins=20,  # Adjust the number of bins as needed
    labels=False
)
# Get unique license types for the dropdown filter
unique_licenses = licenses_sample['License Type'].unique()

# Dropdown selection for specific license type
license_dropdown = alt.binding_select(options=list(unique_licenses), name="Select License Type: ")
license_select = alt.selection_point(fields=['License Type'], bind=license_dropdown, value=unique_licenses[0])
# Brush selection to highlight data points within selected region
brush = alt.selection_interval(encodings=['x', 'y'])

base = alt.Chart(licenses_sample).mark_circle(size=60).encode(
    x=alt.X('License Number (Binned):Q', title='License Number (Binned)', bin=True),
    y=alt.Y('License Number:Q', title='Original License Number'),
    color=alt.condition(license_select, 'License Type:N', alt.value('lightgray')),
    tooltip=['License Number', 'License Type', '_id']
).properties(
    width=600,
    height=400,
    title='Interactive Scatter Plot of License Types (Binned) with Brush and Dropdown Filter'
).add_params(
    brush  # Add brush selection using add_params
).add_params(
    license_select  
)

# Display the interactive plot
plot2 = base.interactive()
plot2



Plot 2: Interactive Scatter Plot of License Types (Binned)
This scatter plot investigates the number of licenses binned versus their original values, adding an extra layer of interactivity by license type. The x-axis has license numbers binned into 20 segments, while the y-axis is the original license numbers. Each point is colored by the license type, which helps in differentiating these different categories. Everything outside the chosen filter is grayed out so that the visualization shows only the interesting subset of data. Tooltips add more information, like license numbers and types, also an ID, which gives more insight to the exploration. Data Preparation: Correct errors in the License Number column by converting invalid values to NaN, dropping rows containing invalid values, and ensuring the column was of integer type. A sample of 500 rows was also taken so that the visualization is manageable to view. Binning has been applied to systematically group license numbers, making this a clearer and more accessible visualization.dcfvgbhjn

Discussion of Interactivity
This is the interactive scatter plot that adds so much to its clarity and appeal. Filtering by specific license types is made enabled through the dropdown menu option, thus helping in focusing on certain categorical groups. The brush selection allows users to highlight specific regions in the data, which makes it easier to see certain trends or patterns that aren't readily apparent. The combination of interactivity added in here makes the visualization even more dynamic, showing users more effectively the deeper insights into the dataset's relationships.

<div class="left">
  {% include elements/button.html link="https://github.com/username/repository/raw/main/licenses_fall.csv" text="The Data" %}
</div>

<div class="right">
  {% include elements/button.html link="https://github.com/sanyamii/repository/blob/main/notebooks/your_notebook.ipynb" text="The Analysis" %}
</div>



<div class="left">
  {% include elements/button.html link="https://github.com/username/repository/raw/main/licenses_fall.csv" text="The Data" %}
</div>

<div class="right">
  {% include elements/button.html link="https://github.com/username/repository/blob/main/notebooks/your_notebook.ipynb" text="The Analysis" %}
</div>
