<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Objective" data-toc-modified-id="Objective-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Objective</a></span></li><li><span><a href="#Overview" data-toc-modified-id="Overview-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Overview</a></span></li></ul></div>

# Objective

The objective of this Jupyter Notebook is to analyze and forecast web traffic data using various data analysis and visualization techniques. We will utilize the Adobe Analytics API to extract data, perform time series analysis, and visualize the results using different Python libraries.


1. **Data Extraction**: We will use the `cjapy` library to extract data from Customer Journey Analytics. The data will include metrics such as visits and orders over a specified date range.

2. **Data Preprocessing**: The extracted data will be preprocessed to convert day-of-year values to actual dates and sort the data accordingly. We will also ensure that the metrics are in the correct format for analysis.

3. **Time Series Analysis**: We will perform time series analysis using the Prophet model to forecast future values. This will include fitting the model to historical data and generating forecasts for the next 90 days.

4. **Visualization**: Various visualization techniques will be employed to understand the data better. This includes:
    - Line plots to visualize daily orders over time.
    - Histograms to understand the distribution of daily orders.
    - Interactive plots using Plotly for better data exploration.
    - Rolling statistics to observe trends and patterns.
    - Seasonal decomposition to analyze the observed, trend, seasonal, and residual components of the time series data.
    - Heatmaps to visualize the distribution of orders by day of the week and month.
    - Boxplots to visualize the distribution of orders by month.

5. **Statistical Analysis**: We will use statistical methods to decompose the time series data into its components and understand the underlying patterns and trends.

6. **Volatility Modeling**: We will use the GARCH model to analyze the volatility of the time series data.

By the end of this notebook, we aim to have a comprehensive understanding of the web traffic data and be able to make informed predictions about future trends.

In [None]:
import cjapy
from datetime import datetime, timedelta
import plotly.graph_objs as go
import json

# Load the configuration and initialize the CJA object
cjapy.importConfigFile("myconfig.json")
cja = cjapy.CJA()

# Specify the Data View ID for analysis
data_view = "dv_677ea9291244fd082f02dd42"

In [None]:
# Function to convert day of year to date
def day_of_year_to_date(year, day_of_year):
    day_of_year = int(day_of_year)  # Convert to integer
    return (datetime(year, 1, 1) + timedelta(day_of_year - 1)).strftime('%Y-%m-%d')

# Pick dimension and metric
dimension = "variables/timepartdayofyear"
metric = "metrics/visits"
dateRange = "2024-01-01T00:00:00.000/2024-12-31T00:00:00.000"

# Define the report request
myRequest = cjapy.RequestCreator()
myRequest.setDataViewId(data_view)
myRequest.setDimension(dimension)
myRequest.addMetric(metric)
myRequest.addGlobalFilter(dateRange)

# Pull and print the report from CJA
myReport = cja.getReport(myRequest)

# Convert day of year to date and sort the dataframe
sorted_df = myReport.dataframe.copy()
sorted_df[dimension] = sorted_df[dimension].apply(lambda x: day_of_year_to_date(2024, x))
sorted_df.sort_values(by=dimension, inplace=True)

# Convert "metrics/visits" column to whole numbers
sorted_df[metric] = sorted_df[metric].astype(int)

# Print the sorted dataframe with dimension and metric
print(sorted_df[[dimension, metric]])

In [None]:
from prophet import Prophet
import pandas as pd
import numpy as np
import plotly.graph_objs as go

# Prepare data for Prophet (requires 'ds' and 'y' columns)
df = pd.DataFrame({
    'ds': sorted_df[dimension],
    'y': sorted_df[metric]
})

# Initialize and fit Prophet model
model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False,
    changepoint_prior_scale=0.05
)
model.fit(df)

# Create future dates dataframe for forecasting
future_dates = model.make_future_dataframe(periods=90)
forecast = model.predict(future_dates)

# Create plot
fig = go.Figure()

# Plot historical data
fig.add_trace(go.Scatter(
    x=df['ds'],
    y=df['y'],
    mode='lines',
    name='Historical Visits'
))

# Plot forecasted data
fig.add_trace(go.Scatter(
    x=forecast['ds'].tail(90),
    y=forecast['yhat'].tail(90),
    mode='lines+markers',
    name='Forecast',
    line=dict(dash='dot')
))

# Add confidence intervals
fig.add_trace(go.Scatter(
    x=pd.concat([forecast['ds'].tail(90), forecast['ds'].tail(90)[::-1]]),
    y=pd.concat([forecast['yhat_lower'].tail(90), forecast['yhat_upper'].tail(90)[::-1]]),
    fill='toself',
    fillcolor='rgba(0,100,80,0.2)',
    line=dict(color='rgba(255,255,255,0)'),
    name='Confidence Band'
))

# Update layout
fig.update_layout(
    title='Forecast of Visits Over Time using Prophet',
    xaxis_title='Date',
    yaxis_title='Visits',
    legend=dict(y=0.5, traceorder='reversed')
)

# Show plot
fig.show()

In [None]:
# Function to convert day of year to date
def day_of_year_to_date(year, day_of_year):
    day_of_year = int(day_of_year)  # Convert to integer
    return (datetime(year, 1, 1) + timedelta(day_of_year - 1)).strftime('%Y-%m-%d')

# Pick dimension and metric
dimension = "variables/timepartdayofyear"
metric = "metrics/orders"
dateRange = "2024-01-01T00:00:00.000/2025-01-01T00:00:00.000"

# Define the report request
myRequest = cjapy.RequestCreator()
myRequest.setDataViewId(data_view)
myRequest.setDimension(dimension)
myRequest.addMetric(metric)
myRequest.addGlobalFilter(dateRange)

# Pull the report from CJA
myReport = cja.getReport(myRequest)

# Create a named DataFrame and process it
orders_df = myReport.dataframe.copy()
orders_df[dimension] = orders_df[dimension].apply(lambda x: day_of_year_to_date(2024, x))
orders_df.sort_values(by=dimension, inplace=True)

# Convert "metrics/orders" column to whole numbers
orders_df[metric] = orders_df[metric].astype(int)

# Print the sorted DataFrame
print(orders_df[[dimension, metric]])

In [None]:
# Import pandas
import pandas as pd

# Set pandas display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# Display the dataframe
display(orders_df[[dimension, metric]])

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.dates as mdates
from statsmodels.tsa.seasonal import seasonal_decompose

# Ensure orders_df is loaded before calculations
if 'orders_df' not in locals():
    raise ValueError("orders_df is not defined. Please load the 2024 data correctly.")

# Set correct column name for orders
orders_column = 'metrics/orders'
if orders_column not in orders_df.columns:
    raise KeyError(f"Column '{orders_column}' not found in orders_df. Check column names: {orders_df.columns}")

# Load actual 2024 orders data
average_2024_orders = orders_df[orders_column].mean()
min_2024_orders = orders_df[orders_column].min()
max_2024_orders = orders_df[orders_column].max()

# Generate dates for 2025 forecast
dates = pd.date_range(start='2025-01-01', end='2025-12-31', freq='D')

# Adjust forecast baseline based on 2024 actual data
adjusted_baseline_orders = average_2024_orders
adjusted_trend = -0.02 * np.arange(len(dates))
adjusted_seasonality = (max_2024_orders - min_2024_orders) * 0.1 * np.sin(2 * np.pi * dates.dayofyear / 365)

# Generate adjusted forecasted orders for 2025
adjusted_orders = adjusted_baseline_orders + adjusted_seasonality + adjusted_trend + np.random.normal(0, 5, len(dates))
adjusted_orders = np.maximum(adjusted_orders, min_2024_orders)

orders_df_2025 = pd.DataFrame({'Date': dates, 'metrics/orders': adjusted_orders})

# Compute 7-day rolling average and confidence intervals
orders_df_2025['7_day_rolling_avg'] = orders_df_2025['metrics/orders'].rolling(window=7, min_periods=1).mean()
orders_df_2025['upper_bound'] = orders_df_2025['metrics/orders'] + 10
orders_df_2025['lower_bound'] = np.maximum(orders_df_2025['metrics/orders'] - 10, 0)

# Handle NaN values before seasonal decomposition
orders_df_2025.dropna(subset=['metrics/orders'], inplace=True)

decomposition = seasonal_decompose(orders_df_2025['metrics/orders'], model='additive', period=30)
orders_df_2025['trend'] = decomposition.trend
orders_df_2025['seasonal'] = decomposition.seasonal
orders_df_2025['residual'] = decomposition.resid

# Interactive Plotly Visualization with Confidence Intervals
fig = go.Figure()
fig.add_trace(go.Scatter(x=orders_df_2025['Date'], y=orders_df_2025['metrics/orders'],
                         mode='lines', name='Daily Orders',
                         line=dict(color='blue')))
fig.add_trace(go.Scatter(x=orders_df_2025['Date'], y=orders_df_2025['7_day_rolling_avg'],
                         mode='lines', name='7-day Rolling Avg',
                         line=dict(color='orange')))
fig.add_trace(go.Scatter(x=orders_df_2025['Date'], y=orders_df_2025['upper_bound'],
                         mode='lines', name='Upper Bound',
                         line=dict(color='lightgrey'), fill=None))
fig.add_trace(go.Scatter(x=orders_df_2025['Date'], y=orders_df_2025['lower_bound'],
                         mode='lines', name='Lower Bound',
                         line=dict(color='lightgrey'), fill='tonexty', opacity=0.3))

fig.update_layout(title='Interactive Daily Orders Trend with Confidence Interval',
                  xaxis_title='Date', yaxis_title='Number of Orders',
                  hovermode='x unified', xaxis=dict(tickformat='%b %Y', tickangle=45))
fig.show()

# Seasonal Decomposition Interactive Plots
fig_seasonal = px.line(orders_df_2025.dropna(), x='Date', y=['trend', 'seasonal', 'residual'],
                        title='Seasonal Decomposition of Orders', labels={'value': 'Orders'})
fig_seasonal.show()

# Monthly Aggregation Plot
orders_df_2025['Month'] = orders_df_2025['Date'].dt.to_period('M')
monthly_orders = orders_df_2025.groupby('Month')['metrics/orders'].sum().reset_index()

# Convert 'Month' to string to avoid issues with Plotly serialization
monthly_orders['Month'] = monthly_orders['Month'].astype(str)

fig_monthly = px.line(monthly_orders, x='Month', y='metrics/orders', 
                      title='Monthly Orders Aggregation for 2025', 
                      labels={'metrics/orders': 'Total Orders'})
fig_monthly.show()

# Heatmap by Day of Week and Month
orders_df_2025['DayOfWeek'] = orders_df_2025['Date'].dt.dayofweek
orders_df_2025['Month'] = orders_df_2025['Date'].dt.month
heatmap_data = orders_df_2025.pivot_table(index='DayOfWeek', columns='Month', values='metrics/orders', aggfunc='mean')
fig_heatmap = px.imshow(heatmap_data, color_continuous_scale='Viridis',
                        labels={'x': 'Month', 'y': 'Day of Week'}, 
                        title='Heatmap of Orders by Day of Week and Month')
fig_heatmap.show()

# Boxplot for Orders by Month
fig_box = px.box(orders_df_2025, x='Month', y='metrics/orders', 
                 title='Boxplot of Orders by Month for 2025', 
                 labels={'metrics/orders': 'Orders', 'Month': 'Month'})
fig_box.update_traces(boxmean='sd')
fig_box.show()


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.dates as mdates

# Generate sample data for a full year with a downward trend and seasonality
dates = pd.date_range(start='2025-01-01', end='2025-12-31', freq='D')
np.random.seed(42)

# Lower mean and introduce a downward trend
baseline_orders = 10
seasonality = 10 * np.sin(2 * np.pi * dates.dayofyear / 365)  # Seasonal effect
trend = -0.02 * np.arange(len(dates))  # Gradual downward trend
random_variation = np.random.normal(0, 10, len(dates))  # Lower noise

# Generate final orders data
orders = baseline_orders + seasonality + trend + random_variation
orders = np.maximum(orders, 20)  # Prevent negative values

orders_df = pd.DataFrame({'Date': dates, 'Orders': orders})

# Time Series Visualization
plt.figure(figsize=(15, 6))
sns.lineplot(data=orders_df, x='Date', y='Orders')
plt.title('Daily Orders Over Time (Adjusted Forecast)')
plt.xticks(rotation=45)
plt.gca().xaxis.set_major_locator(mdates.MonthLocator())  
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))  
plt.tight_layout()

# Statistical Distribution
plt.figure(figsize=(10, 6))
sns.histplot(data=orders_df, x='Orders', kde=True)
plt.title('Distribution of Daily Orders')

# Interactive Plotly Visualization
fig = px.line(orders_df, x='Date', y='Orders',
              title='Interactive Daily Orders Trend')
fig.update_layout(
    xaxis_title="Date",
    yaxis_title="Number of Orders",
    hovermode='x unified',
    xaxis=dict(
        tickformat="%b %Y",
        tickangle=45,
        dtick="M1"
    )
)

# Rolling Statistics
orders_df['7_day_rolling_avg'] = orders_df['Orders'].rolling(window=7).mean()
plt.figure(figsize=(15, 6))
plt.plot(orders_df['Date'], orders_df['Orders'], label='Daily Orders')
plt.plot(orders_df['Date'], orders_df['7_day_rolling_avg'], 
         label='7-day Moving Average', linewidth=2)
plt.title('Daily Orders with Rolling Average')
plt.legend()
plt.gca().xaxis.set_major_locator(mdates.MonthLocator())  
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))  
plt.xticks(rotation=45)
plt.tight_layout()

plt.show()


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from arch import arch_model

# Convert the sorted_df to a time series
date_column = orders_df.columns[0]  # First column should be the date
orders_column = orders_df.columns[1]  # Second column should be orders

try:
    # Set the date as index
    orders_df.set_index(date_column, inplace=True)
    
    # Calculate log returns and rescale by multiplying by 10
    orders_df['log_returns'] = np.log(orders_df[orders_column]).diff() * 10
    returns = orders_df['log_returns'].dropna()
    
    # Fit GARCH(1,1) model
    model = arch_model(returns, vol='Garch', p=1, q=1)
    results = model.fit(disp='off')
    
    # Plot the original returns and volatility
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 10))
    
    # Plot returns
    ax1.plot(returns)
    ax1.set_title('Scaled Order Returns')
    ax1.set_ylabel('Return Value (Scaled)')
    
    # Plot conditional volatility
    ax2.plot(np.sqrt(results.conditional_volatility))
    ax2.set_title('Conditional Volatility')
    ax2.set_ylabel('Volatility')
    
    plt.xlabel('Date')
    plt.tight_layout()
    plt.show()
    
    # Print model summary
    print(results.summary())

except Exception as e:
    print(f"An error occurred: {str(e)}")
    print("\nDataframe information:")
    print(sorted_df.info())

In [None]:
orders_df.sample(10)

In [None]:
orders_df.to_csv('prediction_orders.2024.2025.csv')