# Braided Channels Research Collection

This notebook is used to download the dataset from [LDaCA - Braided Channels](https://data-uat.ldaca.edu.au/collection?id=arcp%3A%2F%2Fname%2Cdoi10.4225%252F01%252F4F8E1281B8E2A&_crateId=arcp%3A%2F%2Fname%2Cdoi10.4225%252F01%252F4F8E1281B8E2A) and licensed under [MIT](https://opensource.org/license/mit).

Please acknoledge that this notebook is adapted from [GitHub - Australian-Text-Analytics-Platform/cooee](https://github.com/Australian-Text-Analytics-Platform/cooee/blob/main/cooee.ipynb).

## 1. Install/Load Packages

To install packages, please uncomment the following code.

In [1]:
# To install ldaca
# !pip install git+https://github.com/Language-Research-Technology/ldaca-py.git

# To install rocrate
# !pip install rocrate

# To install dotenv
# !pip install python-dotenv

In [2]:
import os

import requests
from dotenv import load_dotenv          # Loads environment variables
from ldaca.ldaca import LDaCA           # Loads the LDaCA ReST api wrapper
from rocrate_lang.utils import as_list  # A handy utility for converting to list

## 2. Set Your API Key

Before using [LDaCA](https://data.ldaca.edu.au/)'s APIs to download the dataset, please:

1. Register your account at [LDaCA](https://data.ldaca.edu.au).

2. Go to **User information**, generate and copy your **API key**.

3. Copy and paster your **API key** into the file `vars.env` under the same directory.
   
   Your `vars.env` should look like the following:
   ```txt
   API_KEY=1a61****-****-****-****-********d1c5
   ```

In [3]:
load_dotenv('vars.env')             # Load the environment variables located in the vars.env files
API_TOKEN = os.getenv('API_KEY')    # Store your environment variable in this notebook
if not API_TOKEN:
    print("Get a token from the portal, set a variable in the vars.env file named API_KEY, then restart the kernel.")


## 3. Find All Available Resource Types & Metadata

In [4]:
LDACA_API = 'https://data.ldaca.edu.au/api'                # DO NOT CHANGE
COLLECTION_ID = 'arcp://name,hdl10.4225~01~4F8E1281B8E2A'  # Change to the collection you want to download

# Get the ro-crate metadata. This will create a JSON file under the directory 'metadata'
ldaca = LDaCA(url=LDACA_API, token=API_TOKEN, data_dir='metadata')
ldaca.retrieve_collection(collection=COLLECTION_ID, collection_type='Collection', data_dir='metadata')

# Inspect the metadata
metadata = ldaca.crate
print(metadata)

<rocrate_lang.rocrate_plus.ROCratePlus object at 0x12035cec0>


In [5]:
# Find all types that have linked objects
types = list()
for entity in ldaca.crate.contextual_entities + ldaca.crate.data_entities:
    entity_type = as_list(entity.type)  # Each type is a list
    types.extend([t for t in entity_type])
        
# Print all the types
list(dict.fromkeys(types))

['OrganizationReuseLicense',
 'RepositoryObject',
 'Organization',
 'Person',
 'Language',
 'PropertyValue',
 'Speaker',
 'Geometry',
 'DefinedTerm',
 'SoftwareSourceCode',
 'CreateAction',
 'File',
 'Text',
 'Video']

## 3. Build URLs for Main Resources

In [6]:
# Specify where the main resources is stored
PRIMARY_OBJECT = 'RepositoryObject'

# Build URLs for all main resources
urls = {}  # key: name, value: url
for entity in ldaca.crate.contextual_entities + ldaca.crate.data_entities:
    if PRIMARY_OBJECT in as_list(entity.type):
        item = ldaca.crate.dereference(entity.id).as_jsonld()['hasPart']
        url = item[0]['@id'] if type(item) == list else item['@id']
        name = url.split('/')[-1]
        urls[name] = url
        print(name, url)

29880-0001.pdf https://data.ldaca.edu.au/api/object/arcp%3A%2F%2Fname%2Chdl10.4225~01~4F8E1281B8E2A/fromSLQ/Series%201-Edna%20Jessop_2%20June%202000/29880-0001.pdf
29880-0003.pdf https://data.ldaca.edu.au/api/object/arcp%3A%2F%2Fname%2Chdl10.4225~01~4F8E1281B8E2A/fromSLQ/Series%202-Elizabeth%20'Bid'%20Campbell_2%20June%202000/29880-0003.pdf
29880-0004.pdf https://data.ldaca.edu.au/api/object/arcp%3A%2F%2Fname%2Chdl10.4225~01~4F8E1281B8E2A/fromSLQ/Series%203-Joslin%20Eatts_3%20June%202000/29880-0004.pdf
09_BC_DV_PTB.mov.mp4 https://data.ldaca.edu.au/api/object/arcp%3A%2F%2Fname%2Chdl10.4225~01~4F8E1281B8E2A/09_BC_DV_PTB.mov.mp4
09_BC_DV_PTC.mov.mp4 https://data.ldaca.edu.au/api/object/arcp%3A%2F%2Fname%2Chdl10.4225~01~4F8E1281B8E2A/09_BC_DV_PTC.mov.mp4
Transcript%2029880-0005.doc https://data.ldaca.edu.au/api/object/arcp%3A%2F%2Fname%2Chdl10.4225~01~4F8E1281B8E2A/fromSLQ/Series%204-Liz%20Debney_4%20June%202000/Transcript%2029880-0005.doc
12_BC_DV_PTB.mov.mp4 https://data.ldaca.edu.au/ap

## 4. Download Resources from URLs

In [7]:
# Specify a location you want to download all the resources to
SAVE_PATH = 'Braided-Channels'

# Download resources by sending requests
for name, url in urls.items():
    headers = {"Authorization": "Bearer %s" % API_TOKEN}
    # Send a GET request to the URL
    response = requests.get(url, headers=headers)
    if SAVE_PATH is None:
        full_path = name
    else:
        # Check whether the save_path is existed
        if not os.path.exists(SAVE_PATH):
            os.makedirs(SAVE_PATH)
        full_path = os.path.join(SAVE_PATH, name)
    # Write the content of the response to a file
    with open(full_path, 'wb') as f:
        f.write(response.content)
    print(f"Downloaded as {full_path}")

Downloaded as Braided-Channels/29880-0001.pdf
Downloaded as Braided-Channels/29880-0003.pdf
Downloaded as Braided-Channels/29880-0004.pdf
Downloaded as Braided-Channels/09_BC_DV_PTB.mov.mp4
Downloaded as Braided-Channels/09_BC_DV_PTC.mov.mp4
Downloaded as Braided-Channels/Transcript%2029880-0005.doc
Downloaded as Braided-Channels/12_BC_DV_PTB.mov.mp4
Downloaded as Braided-Channels/12_BC_DV_PTC.mov.mp4
Downloaded as Braided-Channels/13_BC_DV_PTA.mov.mp4
Downloaded as Braided-Channels/13_BC_DV_PTB.mov.mp4
Downloaded as Braided-Channels/13_BC_DV_PTC.mov.mp4
Downloaded as Braided-Channels/14_BC_DV_PTA.mov.mp4
Downloaded as Braided-Channels/29880-0013.pdf
Downloaded as Braided-Channels/29880-0014.pdf
Downloaded as Braided-Channels/17_BC_DV_PTC.mov.mp4
Downloaded as Braided-Channels/29880-0016.pdf
Downloaded as Braided-Channels/19_BC_DV_PTB.mov.mp4
Downloaded as Braided-Channels/20_BC_DV_PTB.mov.mp4
Downloaded as Braided-Channels/29880-0020.pdf
Downloaded as Braided-Channels/29880-0023.pdf
D