b>GOOGLE DRIVE SCRAPPER</b>
By. Indriatmoko

This Python script is a scraper for obtaining a list of directory data and file URLs from files located on Google Drive. To use this script, follow these steps:

1. Create client_secrets.json using the Google Drive API:

   a. Create a new project on the Google Drive API.

    b. In this step, go to console.cloud.google.com and log in with your Google account. Then select "API & API service." Click on "Enable        API service," and search for the Google Drive API. Select the Google Drive API from the search results and choose "Enable."

     c. On the left-hand menu, select "Credentials," then click "Create credentials." Choose "User" as the credential type. In the OAuth           Consent Screen, enter your email and app name. Under "Scope," click "Add or Remove Scope." In the pop-up window, type "Google              Drive" in the search bar, select all relevant Google Drive-related scopes, click "Update," and then "Save and Continue." For OAuth         Client ID, choose "Desktop app," and click "Create." In the final credential step, click "Download JSON" and then "Done."

2. Determine your directory using the pwd() syntax.

3. Rename the JSON file to client_secrets.json and move it to the directory specified in step 2.

4. On the left-hand menu, select "OAuth consent screen" and choose "Add user." Enter your email and click "Save."

5. Get the folder ID by opening Google Drive and looking at the folder's URL. (For example, a Google Drive folder URL is    https://drive.google.com/drive/folders/xxxxxxxxx, where xxxxxxxxx is the folder ID.)

6. Run the script below.

7. If a pop-up browser window requests authentication access to Google Drive, allow it by clicking "Continue."

8. The scraper's output will be in the file_details.xlsx in the directory specified by pwd().

In [None]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import pandas as pd

def authenticate_google_drive():
    gauth = GoogleAuth()
    gauth.LocalWebserverAuth()
    drive = GoogleDrive(gauth)
    return drive

def get_file_details(folder_id, drive, folder_path=''):
    query = f"'{folder_id}' in parents and trashed=false"
    file_list = drive.ListFile({'q': query}).GetList()
    file_details = []

    for file in file_list:
        if file['mimeType'] != 'application/vnd.google-apps.folder':
            file_name = file['title']
            file_url = file['alternateLink']
            file_details.append((folder_path + '/' + file_name, file_url))

    subfolders = drive.ListFile({'q': f"'{folder_id}' in parents and trashed=false and mimeType='application/vnd.google-apps.folder'"}).GetList()

    for subfolder in subfolders:
        folder_name = subfolder['title']
        subfolder_path = folder_path + '/' + folder_name
        file_details.extend(get_file_details(subfolder['id'], drive, folder_path=subfolder_path))

    return file_details

def main():
    folder_id = 'Set-Your-Folder-ID'

    drive = authenticate_google_drive()
    file_details = get_file_details(folder_id, drive)

    df = pd.DataFrame(file_details, columns=['Path',  'Google Drive Link'])
    output_file = 'file_details.xlsx'
    df.to_excel(output_file, index=False)

    print(f"File details saved to '{output_file}'.")

if __name__ == "__main__":
    main()
