# OSE Content Reporting Assistant

The goal of this tool is to make content reporting consistent and fast.

It enables you to upload a sheet of content and then:
* Categorize by flywheel stage
* Categorize by AI content
* Summarize results

To use this tool, start by uploading a CSV. Then execute each of the cells with the "Play" button on the left.

## Import libraries

In [1]:
!pip3 install --quiet --upgrade --user google-cloud-aiplatform \
    beautifulsoup4 pandas plotly


[31mERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.[0m[31m
[0m

To load the Vertex AI SDK for Python, we'll need to restart the notebook in the next cell. You can continue from the following cell.

In [2]:
import IPython
import time

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

: 

In [2]:
from bs4 import BeautifulSoup
import pandas as pd
import plotly.express as px
import requests
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    Image,
    Part
)

In [3]:
model = GenerativeModel("gemini-1.0-pro")

ValueError: Unable to find your project. Please provide a project ID by:
- Passing a constructor argument
- Using vertexai.init()
- Setting project using 'gcloud config set project my-project'
- Setting a GCP environment variable
- To create a Google Cloud project, please follow guidance at https://developers.google.com/workspace/guides/create-project

## Read spreadsheet

In [None]:
csv_file_path = './January.csv'

df = pd.read_csv(csv_file_path)
df.head()

## Read URL contents

To help us categorize content better, let's extract the content from each URL.

* This may take a couple minutes
* Some URLs cannot be downloaded, and will display an error

In [None]:
def extract_text_from_url(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Check for HTTP errors

        soup = BeautifulSoup(response.content, 'html.parser')

        text = soup.get_text(strip=True, separator=" ")

        return text
    except requests.exceptions.RequestException as e:
        print(f"Error for URL {url}: {e}")
        return None

# Apply the extraction function to every URL
df['Contents'] = df['URL'].apply(extract_text_from_url)

df.head()

## Categorize Content by Flywheel Stage

We will now create a new column in the sheet called "Flywheel Stage - AI" with the LLM's analysis of the appropriate flywheel stage.

First, we will provide a lengthy prompt providing detailed information about each flywheel stage for a consistent categorization.

In [None]:
flywheel_stage_prompt = """
What is the flywheel stage of this content? First, I will provide more info on each stage. Then, I will provide you the content details for you to evaluate.

Please only reply with one word, the flywheel stage: Evaluate, Activate, Adopt, Expand, or Advocate.

## Evaluate

Description: Cloud developers initially discover your product and start gauging if it fits their needs.
Content Focus: Explain your product's value proposition, key features, and pricing clearlyâ€”addressing problems/workflows relevant to cloud devs.
Content Examples:
* Blog Posts: "Solving [Cloud Problem] with [Your Product]"
* Webinars: Quick, high-level product overview demos
* Case Studies: How similar devs benefit from your product

## Activate

Description: Developers try your product, aiming to see if it delivers on its promises.
Content Focus: Guide first-time users toward their "aha moment" as efficiently as possible.
Content Examples:
* Quick Start Guides: Step-by-step setup and basic usage
* Interactive Tutorials: Hands-on demos in a sandbox environment
* Cheat Sheets: Essential commands/configurations in a printable format

## Adopt

Description: Developers integrate your product into their projects, making it part of their toolchain.
Content Focus: Enable devs to discover the product's depth and solve more complex problems.
Content Examples:
* Technical Deep Dives: Blog posts on advanced use cases
* Best Practices Webinars: Efficient, scalable, and secure deployment methods
* Reference Documentation: Detailed API descriptions

## Expand

Description: Devs uncover more ways your product can benefit their work, potentially upgrading to broader plans.
Content Focus: Highlight features/functionality that unlock new use cases or efficiencies.
Content Examples:
* Integrations Showcase: How to use your product alongside other cloud development tools
* Webinars on New/Premium Features: Targeting experienced users
* Customer Success Stories: How devs solve wider challenges with advanced tool functionality

## Advocate

Description: Developers actively promote your product within their communities due to their positive experiences.
Content Focus: Empower and incentivize users to share their expertise and enthusiasm.
Content Examples:
* Community Forums: Where devs help each other, with your product team's support
* Guest Blog Posts: Devs authoring content hosted on your site
* Referral Programs: Rewarding devs for bringing in new users
"""

In [None]:
def classify_content(prompt, content_type, title, url, contents, model):
    if contents is None:
      contents = ''
    combined_prompt = f"{prompt}\n\nContent Type: {content_type}\nTitle: {title}\nURL: {url}\nContents: {contents}"
    ##print(combined_prompt)
    response = model.generate_content(combined_prompt, stream=False).text.strip()
    print(response)
    return response

df['Flywheel Stage - Gemini'] = df.apply(lambda row: classify_content(flywheel_stage_prompt, row['Content Type'], row['Title'], row['URL'], row['Contents'], model), axis=1)

df.head()

## Categorize Content by AI

In [None]:
ai_prompt = """Please determine if the content at least partially AI-related. Only reply with a number, 1 if yes, 0 if not.
"""

df['AI Content - Gemini'] = df.apply(lambda row: classify_content(ai_prompt, row['Content Type'], row['Title'], row['URL'], row['Contents'], model), axis=1)

df.head()


## Summarize results

### Flywheel stage

In [None]:
colors = ['blue']
stages = ['Evaluate','Activate', 'Adopt','Expand','Advocate']
stage_counts = [df[df['Flywheel Stage'] == stage].shape[0] for stage in stages]
# stage_counts_ai = [df[df['Flywheel Stage - Gemini'] == stage].shape[0] for stage in stages]

# Build a temporary DataFrame for structured plotting
df_temp = pd.DataFrame({
    'stages': stages,
    'value': stage_counts,
    'classification': ['Manual'] * 5
})

fig = px.line_polar(df_temp, r='value', theta='stages', color='classification',
                    line_close=True, title='Content Classification', color_discrete_sequence=colors)

fig.show()

In [None]:
colors = ['blue', 'orange']
stages = ['Evaluate','Activate', 'Adopt','Expand','Advocate']
stage_counts = [df[df['Flywheel Stage'] == stage].shape[0] for stage in stages]
stage_counts_ai = [df[df['Flywheel Stage - Gemini'] == stage].shape[0] for stage in stages]

# Build a temporary DataFrame for structured plotting
df_temp = pd.DataFrame({
    'stages': stages * 2,
    'value': stage_counts + stage_counts_ai,
    'classification': ['Manual'] * 5 + ['AI'] * 5
})

fig = px.line_polar(df_temp, r='value', theta='stages', color='classification',
                    line_close=True, title='Manual vs. AI Classification', color_discrete_sequence=colors)

fig.show()

In [None]:
df_ai_content_gemini

In [None]:
df_ai_content

In [None]:
df_ai_content_gemini['AI Content - Gemini'].replace({"1": 'AI', "0": 'Non-AI'})

In [None]:
import pandas as pd
import plotly.express as px
import plotly.subplots as sp

df_ai_content = df.groupby('AI Content').size().reset_index(name='Count')
df_ai_content['AI Content'] = df_ai_content['AI Content'].replace({1: 'AI', 0: 'Non-AI'})

# Pie chart for 'AI Content - Gemini'
df_ai_content_gemini = df.groupby('AI Content - Gemini').size().reset_index(name='Count')
df_ai_content_gemini['AI Content - Gemini'] = df_ai_content_gemini['AI Content - Gemini'].replace({"1": 'AI', "0": 'Non-AI'})

fig = sp.make_subplots(rows=1, cols=2,
                       specs=[[{"type": "pie"}, {"type": "pie"}]],
                       subplot_titles=("AI Content - Manual", "AI Content - Gemini"))

# Pie chart for 'AI Content'
pie1 = px.pie(df_ai_content, values='Count', names='AI Content', hole=0.3)
pie1.update_traces(textposition='inside', textinfo='percent+label')

# Pie chart for 'AI Content - Gemini'
pie2 = px.pie(df_ai_content_gemini, values='Count', names='AI Content - Gemini', hole=0.3)
pie2.update_traces(textposition='inside', textinfo='percent+label')

# Add charts to subplots (using data property)
fig.add_trace(pie1.data[0], row=1, col=1)
fig.add_trace(pie2.data[0], row=1, col=2)

# Adjust layout (optional)
fig.update_layout(height=400, width=800, showlegend=False)

# Display the plot
fig.show()
