# Google Colab Setup for Social Media Scraper

This notebook provides a step-by-step guide to set up and run the social media scraper in a Google Colab environment.

## 1. Installation of System Dependencies
First, install the necessary system dependencies:

```python
!apt-get update
!apt-get install -y libnss3-tools
```

## 2. Cloning the Repository
Clone the GitHub repository:

```python
!git clone https://github.com/pwklam/scrape_chinese_social_media.git
```

## 3. Installing Python Packages
Install the required Python packages using pip:

```python
!pip install -r scrape_chinese_social_media/requirements.txt
```

## 4. Configuring Playwright
Install and set up Playwright for browser automation:

```python
!pip install playwright
!playwright install
```

## 5. Creating `urls.txt`
Create a file named `urls.txt` with the URLs to scrape:

```python
with open('urls.txt', 'w') as f:
    f.write('https://example.com
')  # Add your URLs here
```

## 6. Running the Scraper
Execute the scraper script:

```python
!python scrape_chinese_social_media/scraper.py
```

## 7. Exporting to Excel
After scraping, export the results to an Excel file:

```python
import pandas as pd
results = pd.read_csv('results.csv')
results.to_excel('results.xlsx', index=False)
```

## 8. Downloading Results
You can download the results file using the following command:

```python
from google.colab import files
files.download('results.xlsx')
```

## 9. Handling Headless Browser Environment
When running in Google Colab, the browser will run in headless mode. 
Certain features may not work as expected due to this environment. Please refer to Playwright's documentation for limitations.