### 1.1 – Data Preview and Attribute Description
Dataset: Company Growth (2018 – 2030)

**Attributes**
- `Year` – Time period of observation  
- `Revenue` – Total company revenue in millions  
- `Expenses` – Total expenses in millions  
- `Profit` – Net profit in millions  
- `Employees` – Number of employees (head-count)


In [7]:
import pandas as pd, altair as alt
alt.data_transformers.disable_max_rows()

url = "https://raw.githubusercontent.com/sowjanyateppali/company-growth/main/public/company_growth.csv"
data = pd.read_csv(url)
data.head()


Unnamed: 0,Year,Revenue,Expenses,Profit,Employees
0,2018,45.3,31.7,13.6,250
1,2019,50.1,34.9,15.2,270
2,2020,47.8,36.4,11.4,260
3,2021,55.6,39.3,16.3,300
4,2022,60.9,42.8,18.1,320


In [8]:
scatter = (
    alt.Chart(data)
    .mark_circle(opacity=0.8)
    .encode(
        x=alt.X('Revenue:Q', title='Revenue (Millions)'),
        y=alt.Y('Profit:Q', title='Profit (Millions)'),
        color=alt.Color('Year:O', title='Year'),
        size=alt.Size('Employees:Q', title='Employees'),
        tooltip=['Year', 'Revenue', 'Profit', 'Employees']
    )
    .properties(width=550, height=350, title='1.2 – Revenue vs Profit (size = Employees, color = Year)')
)
scatter


In [9]:
scatter_opt = (
    alt.Chart(data)
    .mark_circle(opacity=0.8, stroke='black', strokeWidth=0.3)
    .encode(
        x=alt.X('Revenue:Q', title='Revenue (Log Scale)',
                scale=alt.Scale(type='log', nice=True, clamp=True)),
        y=alt.Y('Profit:Q', title='Profit (Millions)', scale=alt.Scale(zero=False)),
        color=alt.Color('Employees:Q', title='Employees Count',
                        scale=alt.Scale(scheme='blues')),
        size=alt.Size('Profit:Q', title='Profit (Size)', scale=alt.Scale(range=[30,300])),
        tooltip=['Year','Revenue','Profit','Employees']
    )
    .properties(width=550, height=350, title='1.3 – Optimized Scatter (Log X, Refined Size)')
)
scatter_opt


### Why Optimized
- **Log X-axis** reduces skew and reveals differences among smaller revenues.  
- **Removed zero Y baseline** to expand variation visibility.  
- **Size = Profit** conveys magnitude better than employees.  
- **Color gradient** by employees avoids clutter from 13 years of categories.


In [17]:
import altair as alt

# Convert your data into a "long" (melted) form first
stacked_df = data.melt(
    id_vars='Year',
    value_vars=['Revenue', 'Expenses', 'Profit'],
    var_name='Metric',
    value_name='Value'
)

# Now build the stacked bar chart safely
stacked = (
    alt.Chart(stacked_df)
    .mark_bar()
    .encode(
        x=alt.X('Year:O', title='Year'),
        y=alt.Y('Value:Q', title='Amount (Millions)', stack='zero'),
        color=alt.Color('Metric:N', title='Metric'),
        tooltip=['Year', 'Metric', 'Value']
    )
    .properties(width=550, height=350, title='2 – Stacked Bar: Revenue, Expenses & Profit')
)

stacked


In [12]:
# Prepare data first by folding columns
grouped_df = data.melt(id_vars='Year', value_vars=['Profit', 'Expenses'],
                       var_name='Metric', value_name='Value')

# Create grouped bar chart
grouped = (
    alt.Chart(grouped_df)
    .mark_bar()
    .encode(
        x=alt.X('Year:O', title='Year'),
        y=alt.Y('Value:Q', title='Amount (Millions)'),
        color=alt.Color('Metric:N', title='Metric'),
        xOffset='Metric:N',
        tooltip=['Year', 'Metric', 'Value']
    )
    .properties(width=550, height=350,
                title='2 – Grouped Bar: Profit vs Expenses by Year')
)
grouped


In [13]:
area_df = data.melt(id_vars='Year', value_vars=['Revenue','Expenses','Profit'],
                    var_name='Metric', value_name='Value')

area = (
    alt.Chart(area_df)
    .mark_area(opacity=0.7)
    .encode(
        x=alt.X('Year:O', axis=alt.Axis(labelAngle=0, labelOverlap=False)),
        y=alt.Y('Value:Q', title='Amount (Millions)', stack='zero'),
        color=alt.Color('Metric:N', title='Metric'),
        tooltip=['Year','Metric','Value']
    )
    .properties(width=600, height=350, title='3 – Stacked Area Chart: Company Metrics Over Time')
)
area


### Area Chart Discussion
All metrics grow steadily after 2020.  
The widening top band (Profit) shows increasing efficiency;  
Revenue expands faster than Expenses → stronger profitability.


In [14]:
base = alt.Chart(data).encode(
    x=alt.X('Year:O', axis=alt.Axis(labelAngle=0, labelOverlap=False), title='Year')
)

rev = base.mark_line(point=True, color='steelblue').encode(y='Revenue:Q', tooltip=['Year','Revenue'])
exp = base.mark_line(point=True, color='orangered').encode(y='Expenses:Q', tooltip=['Year','Expenses'])
pro = base.mark_line(point=True, color='green').encode(y='Profit:Q', tooltip=['Year','Profit'])

(rev + exp + pro).properties(width=600, height=350,
                             title='4.1 – Revenue, Expenses & Profit (2018–2030)')


In [15]:
line_emp = (
    alt.Chart(data)
    .mark_line(point=True, color='purple')
    .encode(
        x=alt.X('Year:O', axis=alt.Axis(labelAngle=0, labelOverlap=False)),
        y=alt.Y('Employees:Q', title='Employees'),
        tooltip=['Year','Employees']
    )
    .properties(width=600, height=350, title='4.2 – Employee Growth Trend')
)
line_emp


### Line Chart Comparison
- Revenue, Expenses, and Profit all increase → company expansion.  
- Employee growth correlates with rising revenue, confirming scaling capacity.
