![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2Farchitectures%2Ftracking%2Fsetup%2Fgithub&dt=GitHub+Metrics+-+1+-+Initial+Creation.ipynb)

# GitHub Traffic For /statmike/vertex-ai-mlops

Using the [GitHub API](https://docs.github.com/en/rest/metrics/statistics?apiVersion=2022-11-28) to:
- get traffic data and engagement data (stars, forks, watchers)

**Notes:**

The API offer traffic and engagement (stars, forks, watchers) data:
- `/traffic/clones`
- `/traffic/popular/paths`
- `/traffic/popular/referrers`
- `/traffic/views`
- `/stargazers`
- `/forks`
- `/subscribers`


Approach notes:
- I prefer to not convert date/times to formats in pandas and instead save this as a step in BigQuery.  Why? Loading a dataframe to BigQuery has a middle layer where the data gets serialized and transferred.  This middle step is another set of format conversions that can impact dates/times.  This can cause errors when later appending to the same BigQuery tables even when the dataframe matches the original identically. A -> B -> C is not the same as A -> B|C -> C

---
## Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/architectures/tracking/setup/github/GitHub%20Metrics%20-%201%20-%20Traffic%20-%20Initial%20Creation.ipynb) and run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [102]:
PROJECT_ID = 'vertex-ai-mlops-369716' # replace with project ID

In [51]:
try:
    import google.colab
    try:
      from google.cloud import secretmanager
    except ImportError:
      !pip install google-cloud-secret-manager -q
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

Updated property [core/project].


---
## Setup

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
REGION = 'us-central1'

github_user = 'statmike'
github_repo = 'vertex-ai-mlops'

BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'github_metrics'

In [3]:
import requests
import json
import time
from datetime import datetime
import pandas as pd
import numpy as np
from io import StringIO
import os, shutil
import urllib

from google.cloud import bigquery
from google.cloud import secretmanager

In [4]:
bq = bigquery.Client(project = PROJECT_ID)
secret_client = secretmanager.SecretManagerServiceClient()

In [5]:
secret = secret_client.access_secret_version(request = {"name": f'projects/{PROJECT_ID}/secrets/github_api/versions/latest'})
pat = secret.payload.data.decode('utf-8')

---
## GitHub API

Define the API url for the user and repository.  Create a helper function that will make get request from API addresses and if the receive a 202 response (accepted request) then retry until it receives a 200 response (successful response).

In [6]:
github_api_url = f'https://api.github.com/repos/{github_user}/{github_repo}'

In [7]:
def metric_get(metric_type, query_parameters = ''):
  response = requests.get(f'{github_api_url}/{metric_type}{query_parameters}', headers = {'Authorization': f'Bearer {pat}', 'Accept': 'application/vnd.github+json'})
  while response.status_code == 202:
      time.sleep(10)
      response = requests.get(f'{github_api_url}/{metric_type}{query_parameters}', headers = {'Authorization': f'Bearer {pat}', 'Accept': 'application/vnd.github+json'})
  return response

---
## Data Exploration

The following subsection retrieve and format data from different parts of the API related to commits.

### /traffic/clones
- https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28#get-repository-clones
- 14 day history of clones
- schema:
    - count = total clones for windows
    - uniques = unique cloners across window (not the sum of daily)
    - clones:
        - timestamp = midnight of day (start of day)

In [8]:
metric_type = 'traffic/clones'
response = metric_get(metric_type)
response.status_code

200

In [9]:
#json.loads(response.text)

In [10]:
traffic_clones = pd.DataFrame(json.loads(response.text)['clones'])
traffic_clones['14day_uniques'] = np.nan
traffic_clones['14day_uniques'].iloc[-1] = json.loads(response.text)['uniques']
traffic_clones['repo'] = github_user + '/' + github_repo

traffic_clones

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0,timestamp,count,uniques,14day_uniques,repo
0,2023-02-09T00:00:00Z,11,5,,statmike/vertex-ai-mlops
1,2023-02-10T00:00:00Z,28,17,,statmike/vertex-ai-mlops
2,2023-02-11T00:00:00Z,10,6,,statmike/vertex-ai-mlops
3,2023-02-12T00:00:00Z,9,6,,statmike/vertex-ai-mlops
4,2023-02-13T00:00:00Z,6,6,,statmike/vertex-ai-mlops
5,2023-02-14T00:00:00Z,29,7,,statmike/vertex-ai-mlops
6,2023-02-15T00:00:00Z,13,8,,statmike/vertex-ai-mlops
7,2023-02-16T00:00:00Z,20,19,,statmike/vertex-ai-mlops
8,2023-02-17T00:00:00Z,3,2,,statmike/vertex-ai-mlops
9,2023-02-18T00:00:00Z,14,6,,statmike/vertex-ai-mlops


### /traffic/popular/paths
- https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28#get-top-referral-paths
- top 10 documents for past 14 days

In [30]:
metric_type = 'traffic/popular/paths'
response = metric_get(metric_type)
response.status_code

200

In [31]:
#json.loads(response.text)

In [32]:
traffic_popular_paths = pd.DataFrame(json.loads(response.text))
traffic_popular_paths

Unnamed: 0,path,title,count,uniques
0,/statmike/vertex-ai-mlops,statmike/vertex-ai-mlops: Google Cloud Platfor...,643,243
1,/statmike/vertex-ai-mlops/blob/main/00%20-%20S...,vertex-ai-mlops/00 - Environment Setup.ipynb a...,78,46
2,/statmike/vertex-ai-mlops/tree/main/04%20-%20s...,vertex-ai-mlops/04 - scikit-learn at main · st...,75,42
3,/statmike/vertex-ai-mlops/tree/main/00%20-%20S...,vertex-ai-mlops/00 - Setup at main · statmike/...,73,50
4,/statmike/vertex-ai-mlops/tree/main/02%20-%20V...,vertex-ai-mlops/02 - Vertex AI AutoML at main ...,69,46
5,/statmike/vertex-ai-mlops/tree/main/05%20-%20T...,vertex-ai-mlops/05 - TensorFlow at main · stat...,55,34
6,/statmike/vertex-ai-mlops/blob/main/01%20-%20D...,vertex-ai-mlops/01 - BigQuery - Table Data Sou...,55,30
7,/statmike/vertex-ai-mlops/tree/main/01%20-%20D...,vertex-ai-mlops/01 - Data Sources at main · st...,51,29
8,/statmike/vertex-ai-mlops/tree/main/03%20-%20B...,vertex-ai-mlops/03 - BigQuery ML (BQML) at mai...,39,23
9,/statmike/vertex-ai-mlops/blob/main/architectu...,vertex-ai-mlops/05_overview.png at main · stat...,36,19


In [33]:
# remove title
# parse path: no / indicates readme.md, otherwise remove /blob/main and url encode
# add todays date (or yesterday?)

In [34]:
def parse_path(p):
    p = urllib.parse.unquote(p).replace('blob/main/', '')
    p = urllib.parse.unquote(p).replace('tree/main/', '')
    if p.rfind('.') == -1 or (p.rfind('.') < p.rfind('/')):
        p += '/readme.md'
    return p

In [35]:
traffic_popular_paths['file'] = traffic_popular_paths.apply(lambda x: parse_path(x['path']), axis = 1)
traffic_popular_paths = traffic_popular_paths.drop(['title', 'path'], axis = 1)
traffic_popular_paths['timestamp'] = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
traffic_popular_paths['repo'] = github_user + '/' + github_repo

In [36]:
list(traffic_popular_paths['file'])

['/statmike/vertex-ai-mlops/readme.md',
 '/statmike/vertex-ai-mlops/00 - Setup/00 - Environment Setup.ipynb',
 '/statmike/vertex-ai-mlops/04 - scikit-learn/readme.md',
 '/statmike/vertex-ai-mlops/00 - Setup/readme.md',
 '/statmike/vertex-ai-mlops/02 - Vertex AI AutoML/readme.md',
 '/statmike/vertex-ai-mlops/05 - TensorFlow/readme.md',
 '/statmike/vertex-ai-mlops/01 - Data Sources/01 - BigQuery - Table Data Source.ipynb',
 '/statmike/vertex-ai-mlops/01 - Data Sources/readme.md',
 '/statmike/vertex-ai-mlops/03 - BigQuery ML (BQML)/readme.md',
 '/statmike/vertex-ai-mlops/architectures/overview/05_overview.png']

In [37]:
traffic_popular_paths

Unnamed: 0,count,uniques,file,timestamp,repo
0,643,243,/statmike/vertex-ai-mlops/readme.md,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
1,78,46,/statmike/vertex-ai-mlops/00 - Setup/00 - Envi...,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
2,75,42,/statmike/vertex-ai-mlops/04 - scikit-learn/re...,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
3,73,50,/statmike/vertex-ai-mlops/00 - Setup/readme.md,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
4,69,46,/statmike/vertex-ai-mlops/02 - Vertex AI AutoM...,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
5,55,34,/statmike/vertex-ai-mlops/05 - TensorFlow/read...,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
6,55,30,/statmike/vertex-ai-mlops/01 - Data Sources/01...,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
7,51,29,/statmike/vertex-ai-mlops/01 - Data Sources/re...,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
8,39,23,/statmike/vertex-ai-mlops/03 - BigQuery ML (BQ...,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops
9,36,19,/statmike/vertex-ai-mlops/architectures/overvi...,2023-02-23T19:47:14Z,statmike/vertex-ai-mlops


### /traffic/popular/referrers
- https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28#get-top-referral-sources
- top 10 referring sites over past 14 days

In [38]:
metric_type = 'traffic/popular/referrers'
response = metric_get(metric_type)
response.status_code

200

In [39]:
#json.loads(response.text)

In [40]:
traffic_popular_referrers = pd.DataFrame(json.loads(response.text))
traffic_popular_referrers

Unnamed: 0,referrer,count,uniques
0,youtube.com,521,125
1,github.com,233,48
2,Google,207,62
3,statics.teams.cdn.office.net,10,2
4,notebooks.githubusercontent.com,8,5
5,m.youtube.com,6,1
6,mail.google.com,2,2
7,colab.research.google.com,1,1


In [113]:
# add todays date (or yesterday?)

In [41]:
traffic_popular_referrers['timestamp'] = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
traffic_popular_referrers['repo'] = github_user + '/' + github_repo

traffic_popular_referrers

Unnamed: 0,referrer,count,uniques,timestamp,repo
0,youtube.com,521,125,2023-02-23T19:49:00Z,statmike/vertex-ai-mlops
1,github.com,233,48,2023-02-23T19:49:00Z,statmike/vertex-ai-mlops
2,Google,207,62,2023-02-23T19:49:00Z,statmike/vertex-ai-mlops
3,statics.teams.cdn.office.net,10,2,2023-02-23T19:49:00Z,statmike/vertex-ai-mlops
4,notebooks.githubusercontent.com,8,5,2023-02-23T19:49:00Z,statmike/vertex-ai-mlops
5,m.youtube.com,6,1,2023-02-23T19:49:00Z,statmike/vertex-ai-mlops
6,mail.google.com,2,2,2023-02-23T19:49:00Z,statmike/vertex-ai-mlops
7,colab.research.google.com,1,1,2023-02-23T19:49:00Z,statmike/vertex-ai-mlops


### /traffic/views
- https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28#get-page-views
- daily views for last 14 days
- schema:
    - count = total views for last 2 weeks (sum of daily)
    - uniques = total unique over 14 days (not sum of daily)
    - views:
        - timestamp - daily at midnight
        - count = daily count
        - uniques = daily unique count

In [42]:
metric_type = 'traffic/views'
response = metric_get(metric_type)
response.status_code

200

In [43]:
#json.loads(response.text)

In [44]:
traffic_views = pd.DataFrame(json.loads(response.text)['views'])
traffic_views['14day_uniques'] = np.nan
traffic_views['14day_uniques'].iloc[-1] = json.loads(response.text)['uniques']
traffic_views

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0,timestamp,count,uniques,14day_uniques
0,2023-02-09T00:00:00Z,41,8,
1,2023-02-10T00:00:00Z,185,45,
2,2023-02-11T00:00:00Z,78,17,
3,2023-02-12T00:00:00Z,90,20,
4,2023-02-13T00:00:00Z,219,43,
5,2023-02-14T00:00:00Z,176,48,
6,2023-02-15T00:00:00Z,118,37,
7,2023-02-16T00:00:00Z,162,35,
8,2023-02-17T00:00:00Z,157,38,
9,2023-02-18T00:00:00Z,87,18,


In [45]:
traffic_views['repo'] = github_user + '/' + github_repo

traffic_views

Unnamed: 0,timestamp,count,uniques,14day_uniques,repo
0,2023-02-09T00:00:00Z,41,8,,statmike/vertex-ai-mlops
1,2023-02-10T00:00:00Z,185,45,,statmike/vertex-ai-mlops
2,2023-02-11T00:00:00Z,78,17,,statmike/vertex-ai-mlops
3,2023-02-12T00:00:00Z,90,20,,statmike/vertex-ai-mlops
4,2023-02-13T00:00:00Z,219,43,,statmike/vertex-ai-mlops
5,2023-02-14T00:00:00Z,176,48,,statmike/vertex-ai-mlops
6,2023-02-15T00:00:00Z,118,37,,statmike/vertex-ai-mlops
7,2023-02-16T00:00:00Z,162,35,,statmike/vertex-ai-mlops
8,2023-02-17T00:00:00Z,157,38,,statmike/vertex-ai-mlops
9,2023-02-18T00:00:00Z,87,18,,statmike/vertex-ai-mlops


### /stargazers
- https://docs.github.com/en/rest/activity/starring?apiVersion=2022-11-28#list-stargazers
- list of current users who have starred the repository

In [46]:
metric_type = 'stargazers'

page_size = 100
page = 1
raw = []
while page_size == 100:
    response = metric_get(metric_type, f'?per_page={page_size}&page={page}')
    raw_new = json.loads(response.text)
    raw += raw_new
    page_size = len(raw_new)
    page += 1
len(raw)

148

In [47]:
#raw[0]

In [48]:
stargazers = pd.DataFrame(raw)[['login']]
stargazers

Unnamed: 0,login
0,newcooldiscoveries
1,giranntu
2,sinanek
3,amith-ajith
4,rsavoie
...,...
143,JosephDavis
144,dunncw
145,PeterGolovatyi
146,littlefish0331


In [128]:
# add columns for added, dropped, count

In [49]:
stargazers['added'] = ''
stargazers['dropped'] = ''
stargazers['count'] = 1
stargazers['repo'] = github_user + '/' + github_repo

stargazers

Unnamed: 0,login,added,dropped,count,repo
0,newcooldiscoveries,,,1,statmike/vertex-ai-mlops
1,giranntu,,,1,statmike/vertex-ai-mlops
2,sinanek,,,1,statmike/vertex-ai-mlops
3,amith-ajith,,,1,statmike/vertex-ai-mlops
4,rsavoie,,,1,statmike/vertex-ai-mlops
...,...,...,...,...,...
143,JosephDavis,,,1,statmike/vertex-ai-mlops
144,dunncw,,,1,statmike/vertex-ai-mlops
145,PeterGolovatyi,,,1,statmike/vertex-ai-mlops
146,littlefish0331,,,1,statmike/vertex-ai-mlops


### /forks
- https://docs.github.com/en/rest/repos/forks?apiVersion=2022-11-28#list-forks
- list of current forks of main repository

In [50]:
metric_type = 'forks'

page_size = 100
page = 1
raw = []
while page_size == 100:
    response = metric_get(metric_type, f'?per_page={page_size}&page={page}')
    raw_new = json.loads(response.text)
    raw += raw_new
    page_size = len(raw_new)
    page += 1
len(raw)

73

In [51]:
#raw[0]

In [52]:
forks = []
for f in raw:
    forks += [{
        'name': f['name'],
        'full_name': f['full_name'],
        'owner': f['owner']['login'],
        'stars': f['stargazers_count'],
        'watchers': f['watchers_count'],
        'forks': f['forks_count']
    }]
forks = pd.DataFrame(forks)
forks

Unnamed: 0,name,full_name,owner,stars,watchers,forks
0,vertex-ai-mlops,yfumero/vertex-ai-mlops,yfumero,0,0,0
1,vertex-ai-mlops,ivanmkc/vertex-ai-mlops,ivanmkc,0,0,0
2,vertex-ai-mlops,xjaztek/vertex-ai-mlops,xjaztek,0,0,0
3,vertex-ai-mlops,praneethkumar4/vertex-ai-mlops,praneethkumar4,0,0,0
4,vertex-ai-mlops,psod18/vertex-ai-mlops,psod18,0,0,0
...,...,...,...,...,...,...
68,vertex-ai-mlops,danielnguyen-ds/vertex-ai-mlops,danielnguyen-ds,0,0,0
69,vertex-ai-mlops,justinjm/vertex-ai-mlops,justinjm,0,0,0
70,vertex-ai-mlops,motconmeobuon/vertex-ai-mlops,motconmeobuon,0,0,0
71,vertex-ai-mlops,ANN-KOREA/vertex-ai-mlops,ANN-KOREA,0,0,0


In [135]:
# add columns for added, dropped, count

In [53]:
forks['added'] = ''
forks['dropped'] = ''
forks['count'] = 1
forks['repo'] = github_user + '/' + github_repo

forks

Unnamed: 0,name,full_name,owner,stars,watchers,forks,added,dropped,count,repo
0,vertex-ai-mlops,yfumero/vertex-ai-mlops,yfumero,0,0,0,,,1,statmike/vertex-ai-mlops
1,vertex-ai-mlops,ivanmkc/vertex-ai-mlops,ivanmkc,0,0,0,,,1,statmike/vertex-ai-mlops
2,vertex-ai-mlops,xjaztek/vertex-ai-mlops,xjaztek,0,0,0,,,1,statmike/vertex-ai-mlops
3,vertex-ai-mlops,praneethkumar4/vertex-ai-mlops,praneethkumar4,0,0,0,,,1,statmike/vertex-ai-mlops
4,vertex-ai-mlops,psod18/vertex-ai-mlops,psod18,0,0,0,,,1,statmike/vertex-ai-mlops
...,...,...,...,...,...,...,...,...,...,...
68,vertex-ai-mlops,danielnguyen-ds/vertex-ai-mlops,danielnguyen-ds,0,0,0,,,1,statmike/vertex-ai-mlops
69,vertex-ai-mlops,justinjm/vertex-ai-mlops,justinjm,0,0,0,,,1,statmike/vertex-ai-mlops
70,vertex-ai-mlops,motconmeobuon/vertex-ai-mlops,motconmeobuon,0,0,0,,,1,statmike/vertex-ai-mlops
71,vertex-ai-mlops,ANN-KOREA/vertex-ai-mlops,ANN-KOREA,0,0,0,,,1,statmike/vertex-ai-mlops


### /subscribers
- https://docs.github.com/en/rest/activity/watching?apiVersion=2022-11-28#list-watchers
- list of watchers for repository

In [54]:
metric_type = 'subscribers'

page_size = 100
page = 1
raw = []
while page_size == 100:
    response = metric_get(metric_type, f'?per_page={page_size}&page={page}')
    raw_new = json.loads(response.text)
    raw += raw_new
    page_size = len(raw_new)
    page += 1
len(raw)

12

In [55]:
#raw[0]

In [56]:
subscribers = pd.DataFrame(raw)[['login']]
subscribers

Unnamed: 0,login
0,statmike
1,sinanek
2,inardini
3,rafal-wasowski
4,majacaci00
5,hamehrabi
6,alvaroferrerrizzo
7,rmazara-kinaxis
8,slopez-lmes
9,drkostas


In [57]:
# add columns for added, dropped, count

In [58]:
subscribers['added'] = ''
subscribers['dropped'] = ''
subscribers['count'] = 1
subscribers['repo'] = github_user + '/' + github_repo

subscribers

Unnamed: 0,login,added,dropped,count,repo
0,statmike,,,1,statmike/vertex-ai-mlops
1,sinanek,,,1,statmike/vertex-ai-mlops
2,inardini,,,1,statmike/vertex-ai-mlops
3,rafal-wasowski,,,1,statmike/vertex-ai-mlops
4,majacaci00,,,1,statmike/vertex-ai-mlops
5,hamehrabi,,,1,statmike/vertex-ai-mlops
6,alvaroferrerrizzo,,,1,statmike/vertex-ai-mlops
7,rmazara-kinaxis,,,1,statmike/vertex-ai-mlops
8,slopez-lmes,,,1,statmike/vertex-ai-mlops
9,drkostas,,,1,statmike/vertex-ai-mlops


---
## Pandas Tables

In [59]:
# none to combine from above... yet

---
## BigQuery Tables: Initial Creation

In [None]:
def bq_loader(df):
    load_job = bq.load_table_from_dataframe(
        dataframe = df,
        destination = bigquery.TableReference.from_string(f"{BQ_PROJECT}.{BQ_DATASET}.{df}"),
        job_config = bigquery.LoadJobConfig(
            write_disposition = 'WRITE_TRUNCATE', # WRITE_TRUNCATE = replace if exists, WRITE_APPEND = append if exists, WRITE_EMPTY = write new but dont overwrite
            autodetect = True, # detect schema
        )
    )
    return load_job.result()

In [None]:
bq_loader(traffic_clones)
bq_loader(traffic_popular_paths)
bq_loader(traffic_popular_referrers)
bq_loader(traffic_views)
bq_loader(stargazers)
bq_loader(forks)
bq_loader(subscribers)

In [None]:
list(bq.list_tables(
     dataset = bigquery.DatasetReference(
         project = BQ_PROJECT,
         dataset_id = BQ_DATASET
     )
))

---
## BigQuery Tables: Increment

Approach:
- Forward incrementing, same time or later
- Efficiency
    - only pull what is needed
    - only replace what is changed or changable
    - only append what is new
    - only update as often as needed


### /traffic/clones
- https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28#get-repository-clones
- 14 day history of clones
- increment:
    - retrieve records where timestamp >= min
        - since stored as string, on BQ side this will require convert, compare, select
    - match records on date
    - keep highest values of count, uniques, 14 day uniques
        - why? because GitHub truncates first and last day of return based on last calculation time

In [62]:
metric_type = 'traffic/clones'
response = metric_get(metric_type)

traffic_clones = pd.DataFrame(json.loads(response.text)['clones'])
traffic_clones['14day_uniques'] = np.nan
traffic_clones['14day_uniques'].iloc[-1] = json.loads(response.text)['uniques']
traffic_clones['repo'] = github_user + '/' + github_repo

traffic_clones

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0,timestamp,count,uniques,14day_uniques,repo
0,2023-02-09T00:00:00Z,2,2,,statmike/vertex-ai-mlops
1,2023-02-10T00:00:00Z,28,17,,statmike/vertex-ai-mlops
2,2023-02-11T00:00:00Z,10,6,,statmike/vertex-ai-mlops
3,2023-02-12T00:00:00Z,9,6,,statmike/vertex-ai-mlops
4,2023-02-13T00:00:00Z,6,6,,statmike/vertex-ai-mlops
5,2023-02-14T00:00:00Z,29,7,,statmike/vertex-ai-mlops
6,2023-02-15T00:00:00Z,13,8,,statmike/vertex-ai-mlops
7,2023-02-16T00:00:00Z,20,19,,statmike/vertex-ai-mlops
8,2023-02-17T00:00:00Z,3,2,,statmike/vertex-ai-mlops
9,2023-02-18T00:00:00Z,14,6,,statmike/vertex-ai-mlops


### /traffic/popular/paths
- https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28#get-top-referral-paths
- top 10 documents for past 14 days
- increment:
    - append only

In [63]:
metric_type = 'traffic/popular/paths'
response = metric_get(metric_type)

traffic_popular_paths = pd.DataFrame(json.loads(response.text))

def parse_path(p):
    p = urllib.parse.unquote(p).replace('blob/main/', '')
    p = urllib.parse.unquote(p).replace('tree/main/', '')
    if p.rfind('.') == -1 or (p.rfind('.') < p.rfind('/')):
        p += '/readme.md'
    return p

traffic_popular_paths['file'] = traffic_popular_paths.apply(lambda x: parse_path(x['path']), axis = 1)
traffic_popular_paths = traffic_popular_paths.drop(['title', 'path'], axis = 1)
traffic_popular_paths['timestamp'] = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
traffic_popular_paths['repo'] = github_user + '/' + github_repo

traffic_popular_paths

Unnamed: 0,count,uniques,file,timestamp,repo
0,643,243,/statmike/vertex-ai-mlops/readme.md,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
1,78,46,/statmike/vertex-ai-mlops/00 - Setup/00 - Envi...,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
2,75,42,/statmike/vertex-ai-mlops/04 - scikit-learn/re...,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
3,73,50,/statmike/vertex-ai-mlops/00 - Setup/readme.md,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
4,69,46,/statmike/vertex-ai-mlops/02 - Vertex AI AutoM...,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
5,55,34,/statmike/vertex-ai-mlops/05 - TensorFlow/read...,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
6,55,30,/statmike/vertex-ai-mlops/01 - Data Sources/01...,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
7,51,29,/statmike/vertex-ai-mlops/01 - Data Sources/re...,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
8,39,23,/statmike/vertex-ai-mlops/03 - BigQuery ML (BQ...,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops
9,36,19,/statmike/vertex-ai-mlops/architectures/overvi...,2023-02-23T21:06:49Z,statmike/vertex-ai-mlops


### /traffic/popular/referrers
- https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28#get-top-referral-sources
- top 10 referring sites over past 14 days
- increment:
    - append only

In [64]:
metric_type = 'traffic/popular/referrers'
response = metric_get(metric_type)

traffic_popular_referrers = pd.DataFrame(json.loads(response.text))
traffic_popular_referrers['timestamp'] = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
traffic_popular_referrers['repo'] = github_user + '/' + github_repo

traffic_popular_referrers

Unnamed: 0,referrer,count,uniques,timestamp,repo
0,youtube.com,521,125,2023-02-23T21:07:39Z,statmike/vertex-ai-mlops
1,github.com,233,48,2023-02-23T21:07:39Z,statmike/vertex-ai-mlops
2,Google,207,62,2023-02-23T21:07:39Z,statmike/vertex-ai-mlops
3,statics.teams.cdn.office.net,10,2,2023-02-23T21:07:39Z,statmike/vertex-ai-mlops
4,notebooks.githubusercontent.com,8,5,2023-02-23T21:07:39Z,statmike/vertex-ai-mlops
5,m.youtube.com,6,1,2023-02-23T21:07:39Z,statmike/vertex-ai-mlops
6,mail.google.com,2,2,2023-02-23T21:07:39Z,statmike/vertex-ai-mlops
7,colab.research.google.com,1,1,2023-02-23T21:07:39Z,statmike/vertex-ai-mlops


### /traffic/views
- https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28#get-page-views
- daily views for last 14 days
- increment:
    - retrieve records where timestamp >= min
        - since stored as string, on BQ side this will require convert, compare, select
    - match records on date
    - keep highest values of count, uniques, 14 day uniques

In [65]:
metric_type = 'traffic/views'
response = metric_get(metric_type)

traffic_views = pd.DataFrame(json.loads(response.text)['views'])
traffic_views['14day_uniques'] = np.nan
traffic_views['14day_uniques'].iloc[-1] = json.loads(response.text)['uniques']
traffic_views['repo'] = github_user + '/' + github_repo

traffic_views

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0,timestamp,count,uniques,14day_uniques,repo
0,2023-02-09T00:00:00Z,18,5,,statmike/vertex-ai-mlops
1,2023-02-10T00:00:00Z,185,45,,statmike/vertex-ai-mlops
2,2023-02-11T00:00:00Z,78,17,,statmike/vertex-ai-mlops
3,2023-02-12T00:00:00Z,90,20,,statmike/vertex-ai-mlops
4,2023-02-13T00:00:00Z,219,43,,statmike/vertex-ai-mlops
5,2023-02-14T00:00:00Z,176,48,,statmike/vertex-ai-mlops
6,2023-02-15T00:00:00Z,118,37,,statmike/vertex-ai-mlops
7,2023-02-16T00:00:00Z,162,35,,statmike/vertex-ai-mlops
8,2023-02-17T00:00:00Z,157,38,,statmike/vertex-ai-mlops
9,2023-02-18T00:00:00Z,87,18,,statmike/vertex-ai-mlops


### /stargazers
- https://docs.github.com/en/rest/activity/starring?apiVersion=2022-11-28#list-stargazers
- list of current users who have starred the repository
- increment:
    - if new, append:
        - added = yesterday's date, dropped = blank, count = 1
    - if reoccur, if dropped is blank: do nothing
    - if reoccur, if dropped < yesterday's date, replace (delete, append):
        - dropped = blank, recent_added = yesterday's date, count += 1

In [69]:
metric_type = 'stargazers'

page_size = 100
page = 1
raw = []
while page_size == 100:
    response = metric_get(metric_type, f'?per_page={page_size}&page={page}')
    raw_new = json.loads(response.text)
    raw += raw_new
    page_size = len(raw_new)
    page += 1

stargazers = pd.DataFrame(raw)[['login']]
stargazers['added'] = ''
stargazers['dropped'] = ''
stargazers['count'] = 1
stargazers['repo'] = github_user + '/' + github_repo

stargazers

Unnamed: 0,login,added,dropped,count,repo
0,newcooldiscoveries,,,1,statmike/vertex-ai-mlops
1,giranntu,,,1,statmike/vertex-ai-mlops
2,sinanek,,,1,statmike/vertex-ai-mlops
3,amith-ajith,,,1,statmike/vertex-ai-mlops
4,rsavoie,,,1,statmike/vertex-ai-mlops
...,...,...,...,...,...
143,JosephDavis,,,1,statmike/vertex-ai-mlops
144,dunncw,,,1,statmike/vertex-ai-mlops
145,PeterGolovatyi,,,1,statmike/vertex-ai-mlops
146,littlefish0331,,,1,statmike/vertex-ai-mlops


### /forks
- https://docs.github.com/en/rest/repos/forks?apiVersion=2022-11-28#list-forks
- list of current forks of main repository
- increment:
    - if new, append:
        - added = yesterday's date, dropped = blank, count = 1
    - if reoccur, if dropped is blank: do nothing
    - if reoccur, if dropped < yesterday's date, replace (delete, append):
        - dropped = blank, recent_added = yesterday's date, count += 1

In [68]:
metric_type = 'forks'

page_size = 100
page = 1
raw = []
while page_size == 100:
    response = metric_get(metric_type, f'?per_page={page_size}&page={page}')
    raw_new = json.loads(response.text)
    raw += raw_new
    page_size = len(raw_new)
    page += 1

forks = []
for f in raw:
    forks += [{
        'name': f['name'],
        'full_name': f['full_name'],
        'owner': f['owner']['login'],
        'stars': f['stargazers_count'],
        'watchers': f['watchers_count'],
        'forks': f['forks_count']
    }]
forks = pd.DataFrame(forks)
forks['added'] = ''
forks['dropped'] = ''
forks['count'] = 1
forks['repo'] = github_user + '/' + github_repo

forks

Unnamed: 0,name,full_name,owner,stars,watchers,forks,added,dropped,count,repo
0,vertex-ai-mlops,yfumero/vertex-ai-mlops,yfumero,0,0,0,,,1,statmike/vertex-ai-mlops
1,vertex-ai-mlops,ivanmkc/vertex-ai-mlops,ivanmkc,0,0,0,,,1,statmike/vertex-ai-mlops
2,vertex-ai-mlops,xjaztek/vertex-ai-mlops,xjaztek,0,0,0,,,1,statmike/vertex-ai-mlops
3,vertex-ai-mlops,praneethkumar4/vertex-ai-mlops,praneethkumar4,0,0,0,,,1,statmike/vertex-ai-mlops
4,vertex-ai-mlops,psod18/vertex-ai-mlops,psod18,0,0,0,,,1,statmike/vertex-ai-mlops
...,...,...,...,...,...,...,...,...,...,...
68,vertex-ai-mlops,danielnguyen-ds/vertex-ai-mlops,danielnguyen-ds,0,0,0,,,1,statmike/vertex-ai-mlops
69,vertex-ai-mlops,justinjm/vertex-ai-mlops,justinjm,0,0,0,,,1,statmike/vertex-ai-mlops
70,vertex-ai-mlops,motconmeobuon/vertex-ai-mlops,motconmeobuon,0,0,0,,,1,statmike/vertex-ai-mlops
71,vertex-ai-mlops,ANN-KOREA/vertex-ai-mlops,ANN-KOREA,0,0,0,,,1,statmike/vertex-ai-mlops


### /subscribers
- https://docs.github.com/en/rest/activity/watching?apiVersion=2022-11-28#list-watchers
- list of watchers for repository
- increment:
    - if new, append:
        - added = yesterday's date, dropped = blank, count = 1
    - if reoccur, if dropped is blank: do nothing
    - if reoccur, if dropped < yesterday's date, replace (delete, append):
        - dropped = blank, recent_added = yesterday's date, count += 1

In [70]:
metric_type = 'subscribers'

page_size = 100
page = 1
raw = []
while page_size == 100:
    response = metric_get(metric_type, f'?per_page={page_size}&page={page}')
    raw_new = json.loads(response.text)
    raw += raw_new
    page_size = len(raw_new)
    page += 1

subscribers = pd.DataFrame(raw)[['login']]
subscribers['added'] = ''
subscribers['dropped'] = ''
subscribers['count'] = 1
subscribers['repo'] = github_user + '/' + github_repo

subscribers

Unnamed: 0,login,added,dropped,count,repo
0,statmike,,,1,statmike/vertex-ai-mlops
1,sinanek,,,1,statmike/vertex-ai-mlops
2,inardini,,,1,statmike/vertex-ai-mlops
3,rafal-wasowski,,,1,statmike/vertex-ai-mlops
4,majacaci00,,,1,statmike/vertex-ai-mlops
5,hamehrabi,,,1,statmike/vertex-ai-mlops
6,alvaroferrerrizzo,,,1,statmike/vertex-ai-mlops
7,rmazara-kinaxis,,,1,statmike/vertex-ai-mlops
8,slopez-lmes,,,1,statmike/vertex-ai-mlops
9,drkostas,,,1,statmike/vertex-ai-mlops


---
## Diagnostics