# File Description: repost_extraction_new.ipynb

This file is responsible for extracting and processing repost data, likely from a dataset or an external source. It is part of the data collection phase of the project and focuses on mapping usernames to their corresponding reposted videos.

## Purpose
This notebook is designed to:
1. Iterate through a subset of usernames and their associated reposted videos.
2. Combine the data into a dictionary where each username is mapped to their reposted videos.
3. Prepare the data for further analysis or storage.

## Key Workflow
1. **Data Subsetting**: Processes a specific range of usernames (e.g., `usernames[14009:15009]`) and their corresponding reposted videos.
2. **Data Mapping**: Constructs a dictionary (`combined`) where keys are usernames and values are lists of reposted videos.
3. **Output**: The resulting dictionary is ready for further use, such as saving, analysis, or integration with other datasets.

In [None]:
import pandas as pd
import requests

In [16]:
def get_access_token(client_key, client_secret):
    # Endpoint URL
    endpoint_url = "https://open.tiktokapis.com/v2/oauth/token/"

    # Request headers
    headers = {
        'Content-Type': 'application/x-www-form-urlencoded',
    }

    # Request body parameters
    data = {
        'client_key': client_key,
        'client_secret': client_secret,
        'grant_type': 'client_credentials',
    }

    # Make the POST request
    response = requests.post(endpoint_url, headers=headers, data=data)

    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Parse and print the response JSON
        response_json = response.json()
        return response_json['access_token']
    else:
        # If the request was not successful, print the error response JSON
        
        return response.json()

In [18]:
def get_repost_info(username, access_token,fields ="id, username, favorites_count" ):
    query_params = {"fields": fields}
    query_body = {"username": username}
    headers = {"Authorization": f"Bearer {access_token}"}
    
    endpoint = "https://open.tiktokapis.com/v2/research/user/reposted_videos/"
    response = requests.post(endpoint, json=query_body, params=query_params, headers=headers)

    if response.status_code == 200:
        # Parse and extract information from the response
        return response.json().get("data", {})
    else:
        
        return {response.text}

In [2]:
follows = pd.read_csv('../shared-folder-gald/data/follow-link.csv')

In [3]:
usernames = pd.unique(follows[['source', 'target']].values.ravel()).tolist()

In [15]:
with open("../shared-folder-gald/keys2.txt") as f:
    lines = f.readlines()
    client_key = lines[0].strip()
    client_secret = lines[1].strip()

In [19]:
access_token = get_access_token(client_key, client_secret)

keeping track with calls
27.02 - we ran keys2 unril 1000 

1.03 - keys2 1001:2002

1.03 - keys1 2002:3003

2.03 - keys1 3003:4004

2.03 - keys2 4004:5005

3.03 - keys1 5005:6006

3.03 - keys2 6006:7007

4.03 - keys1 7007:8008

4.03 - keys2 8008:9009

5.03 - keys1 9009:10009

5.03 - keys2 10009:11009

6.03 - keys1 11009:12009

6.03 - keys2 12009:13009

11.03 - keys1 13009:14009

11.03 - keys2 14009:15009

In [20]:
reposted_videos = []
for username in usernames[14009:15009]:
    reposted_videos.append(get_repost_info(username, access_token))

In [21]:
reposted_videos

[]

In [23]:
combined = {}
for i, username in enumerate(usernames[14009:15009]):
    combined[username] = reposted_videos[i]
    
combined 


{}

In [None]:
rows = []

for key, value in combined.items():
    if value and 'reposted_videos' in value: 
        for video in value['reposted_videos']:
            rows.append({
                'username': key,  
                'video_id': video['id'], 
                'creator': video['username'], 
                'favourites': video["favorites_count"]
            })

reposted_df = pd.DataFrame(rows)

print(reposted_df)

Empty DataFrame
Columns: []
Index: []


In [25]:
reposted_df.to_csv("reposted_videos.csv", mode="a", index=False, header=False)