# Florida free-speech project

The goal of the project is to get transcripts from Town Hall meetings in Florida cities and towns for research purposes. 

In practice, from a given list of towns in Florida, I used YouTube API to first search for official town/city channel. The I asked ChatGPT to evaluate if the channel seems official based on a channel title and description. Lastly, I called YouTube API to get all videos from the channel and get transcripts for each video.

## Setup

The repo is called [transcripts](https://github.com/nesaboz/transcripts):

In [None]:
def get_github_code():
    # First get GitHub code:
    !wget https://github.com/nesaboz/transcripts/archive/refs/heads/main.zip
    # unzip it
    !unzip main.zip
    # copy all the files to root
    !mv ./transcripts-main/* .
    # delete the empty folder
    !rm -r transcripts-main
    # delete zip file
    !rm main.zip
    # delete main.ipynb since it's confusing to have it Colab:
    !rm main.ipynb

def install_packages():
    !pip install -r requirements.txt

In [None]:
try:
    from google.colab import drive
    IS_COLAB = True
except ModuleNotFoundError:
    IS_COLAB = False


if IS_COLAB: 
    response = input("Do you want to setup everything? ([yes]/no): ").lower().strip()
    if response != "no":
        !rm -r sample_data  # delete sample_data for beauty
        get_github_code()
        drive.mount('/content/drive')
        install_packages()

##  Imports

In [47]:
from utils import ChannelCrawler, ChannelAnalyzer, aggregate_analysis_files, Channel, VideoInfo

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Search for YT channels

So we go over the list of all cities in Florida and search YouTube for "city of XYZ Florida" and "town of XYZ Florida". This is what `Crawler` class does. See docstring in `Crawler` for details.

In [48]:
crawler = ChannelCrawler(search_query_fns=[lambda x: f"town of {x}, Florida", lambda x: f"city of {x}, Florida"])

File status.csv not found. Creating a new one.


Now start crawling, limit is infinite by default though you will of course hit into YouTube API quota limit:

In [None]:
crawler.start(limit=10)

If all goes well one should have folder called `responses` in the root.

## Analysis

For each json response in `responses` we will now ask ChatGPT to determine whether the channel is official or not. This will a new folder `analysis` with csv files having yes/no answers, and updates in `status.csv`. We first create analyzer and then run it:

In [53]:
analyzer = ChannelAnalyzer(
    model_name="gpt-4",  # "gpt-3.5-turbo"
    prompt_fn= lambda x: f"Your job will be to analyze a short text, \
comprised of a title and a description of a YouTube channel, to asses whether this \
text corresponds to an official YouTube channel of a city {x}, in Florida. Your answer should be 'Yes' or 'No' only")

In [54]:
analyzer.start()

## Aggregation

We now aggregate the results in an excel file, very similar to the `assets/cities_to_collect.xlsx`, storing only positive results:

In [None]:
aggregate_analysis_files(crawler, 'aggregated_analysis.xlsx')

# Get all live videos from one channel

We now feed in channel id and get all the live videos from that channel. Let's take an example of city of Belleair Beach (this city is NOT in the list of cities provided):

In [None]:
# channel = Channel('UCBTiCuq7bdOfOjqAnHY0zbA')
channel = Channel('UCm9YZSpPqHckVrtDdrL3isw')


In [None]:
channel.get_videos()

## Extract info from one video

In [None]:
video = VideoInfo("thGB9IILDOw")

In [None]:
video.get_all_video_info()

# Extract all transcripts from a channel

In [None]:
channel.extract_all()

In [None]:
video = VideoInfo('3eHSnYwnX4g')

In [None]:
video.get_only_transcript()