<img width="8%" alt="Google Sheets.png" src="https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/.github/assets/logos/Google%20Sheets.png" style="border-radius: 15%">

# Google Sheets - Send data to spreadsheet

**Tags:** #googlesheets #gsheet #data #naas_drivers #operations #snippet

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

**Description:** This notebook allows to send data to Google Sheets to a Google Sheets spreadsheet.

## Input

### Import libraries

In [1]:
from naas_drivers import gsheet
import pandas as pd
import os
from datetime import date
import naas_data_product

✅ utils file '/home/ftp/abi/utils/data.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/naas_chat_plugin.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/naas_lab.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/openai.ipynb' successfully loaded.


### Setup variables
**Inputs**
- `input_dir`: Input directory to retrieve file from.
- `file_name`: Name of the file to be retrieved.

**Outputs**
- `spreadsheet_url`: Google Sheets spreadsheet URL.
- `sheet_name`: Google Sheets sheet name.
- `append`: If False, data will be canceled and replaced.

In [2]:
# Inputs
input_dir = os.path.join(naas_data_product.OUTPUTS_PATH, "content-engine", date.today().isoformat())
file_name = "linkedin_posts"
openai_api_key = naas.secret.get("OPENAI_API_KEY")

# Outputs
spreadsheet_url = naas.secret.get("ABI_SPREADSHEET") or "YOUR_GOOGLE_SPREADSHEET_URL"
sheet_name = "CONTENT"
append = False

## Model

### Get data from Google Sheets spreadsheet

In [3]:
df_gsheet = gsheet.connect(spreadsheet_url).get(sheet_name=sheet_name)
print("Rows:", len(df_gsheet))
df_gsheet.head(1)

Rows: 49


Unnamed: 0,ENTITY,SCENARIO,SOURCE,PUBLISHED_DATE,DATE,TIME,TITLE,CONTENT,TOPICS,CONTENT_LENGTH,KEYWORDS,VIEWS,LIKES,COMMENTS,SHARES,ENGAGEMENT_SCORE,CONTENT_URL
0,Travis Oliphant,W48-2023,LinkedIn,2023-11-27 02:24:48+0100,Mon. 27 Nov.,02H24,We can only offer this for free download for a...,We can only offer this for free download for a...,"""Free download offer, limited time availabilit...",97,,0,41,3,2,0,https://www.linkedin.com/feed/update/urn:li:ac...


### Get posts from local

In [4]:
df_posts = pload(input_dir, file_name)    
print("Rows:", len(df_posts))
df_posts.head(1)

Rows: 1


Unnamed: 0,ACTIVITY_ID,PAGINATION_TOKEN,PUBLISHED_DATE,AUTHOR_NAME,AUTHOR_URL,SUBDESCRIPTION,TITLE,TEXT,CHARACTER_COUNT,TAGS,...,POLL_ID,POLL_QUESTION,POLL_RESULTS,POST_URL,VIEWS,COMMENTS,LIKES,SHARES,ENGAGEMENT_SCORE,DATE_EXTRACT
0,7141422260812210176,dXJuOmxpOmFjdGl2aXR5OjcxNDE0MjIyNjA4MTIyMTAxNz...,2023-12-15 14:42:28+0100,Travis Oliphant,https://www.linkedin.com/in/ACoAAACOHgYBT-RIKF...,16h •,Good old days stories:,"Good old days stories: \n\nAs a child, after t...",289,,...,,,,https://www.linkedin.com/feed/update/urn:li:ac...,0,4,36,0,0,2023-12-16 07:41:50


### Cleaning data

In [5]:
# Get topics
topics = {}
if "TOPICS" in df_gsheet.columns:
    for row in df_gsheet.itertuples():
        topics[row.CONTENT_URL] = row.TOPICS

df = df_posts.copy()

# Cleaning if title is None and Content = 'Video (native)' -> "Live"
df.loc[(df["TITLE"].astype(str) == 'None') & (df["CONTENT"] == 'Video (native)'), "TITLE"] = "Live"
df.loc[df["TITLE"].astype(str) == 'Live', "TEXT"] = "Live"

# Select
to_select = [
    "AUTHOR_NAME",
    "PUBLISHED_DATE",
    "TITLE",
    "TEXT",
    "CHARACTER_COUNT",
    "TAGS",
    "VIEWS",
    "LIKES",
    "COMMENTS",
    "SHARES",
    "ENGAGEMENT_SCORE",
    "POST_URL"
]

to_rename = {
    "POST_URL": "CONTENT_URL",
    "AUTHOR_NAME": "ENTITY",
    "TEXT": "CONTENT",
    "CHARACTER_COUNT": "CONTENT_LENGTH",
    "TAGS": "KEYWORDS",
}
df = df[to_select]
df = df.rename(columns=to_rename)
df.insert(loc=1, column="SCENARIO", value=pd.to_datetime(df['PUBLISHED_DATE'].str[:19], format='%Y-%m-%d %H:%M:%S').dt.strftime("W%W-%Y"))
df.insert(loc=2, column="SOURCE", value="LinkedIn")
df.insert(loc=4, column="DATE", value=pd.to_datetime(df['PUBLISHED_DATE'].str[:19], format='%Y-%m-%d %H:%M:%S').dt.strftime("%a. %d %b."))
df.insert(loc=5, column="TIME", value=pd.to_datetime(df['PUBLISHED_DATE'].str[:19], format='%Y-%m-%d %H:%M:%S').dt.strftime('%HH%M'))
df.insert(loc=8, column="TOPICS", value="TBU")

# Drop duplicates
df = pd.concat([df, df_gsheet])
df = df.drop_duplicates("CONTENT_URL", keep='first').reset_index(drop=True)

# Add new topics
prompt = "Identify the main topics discussed in the content and provide a concise list in a string"
for row in df.itertuples():
    content_url = row.CONTENT_URL
    content = row.CONTENT
    if content_url not in topics:
        topic = create_chat_completion(
            openai_api_key,
            prompt,
            content
        )
        topics[content_url] = topic
        pdump(input_dir, topics, "topics")
df["TOPICS"] = df["CONTENT_URL"].map(topics)

print("Rows:", len(df))
df.head(1)

Rows: 50


Unnamed: 0,ENTITY,SCENARIO,SOURCE,PUBLISHED_DATE,DATE,TIME,TITLE,CONTENT,TOPICS,CONTENT_LENGTH,KEYWORDS,VIEWS,LIKES,COMMENTS,SHARES,ENGAGEMENT_SCORE,CONTENT_URL
0,Travis Oliphant,W50-2023,LinkedIn,2023-12-15 14:42:28+0100,Fri. 15 Dec.,14H42,Good old days stories:,"Good old days stories: \n\nAs a child, after t...","""Childhood memories, Timex Sinclair, TI99-4A, ...",289,,0,36,4,0,0,https://www.linkedin.com/feed/update/urn:li:ac...


## Output

### Send data to Google Sheets spreadsheet

In [6]:
gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df, append=append)

{'insertedRow': 50}