# Welcome to TDM Studio #

Welcome to ProQuest's Text and Data Mining (TDM) Studio! This guide will take you through transferring your first corpus to your Jupyter Notebook. 

If you have not already done so, please generate your first dataset using our user interface - found by clicking the `+Create New Dataset` button on the dashboard of TDM Studio. If you have already navigated away from your dashboard, you can access your workbench dashboard by typing the following link in your browsers address bar:

https://tdmstudio.proquest.com/workbenchdashboard

After selecting the publications you would like to use for your data, as well as inputting your search query, please wait for the dataset to finish querying. The `Status` column will switch to `Complete` once your dataset is completed.

The following code cells will set up a simple GUI for you to visually select the datasets that you have created through the dataset creation GUI. Please run them every time you create a new dataset, so that the selection options will update.

## Step 1 - Name Your Dataset (Local) ##
Please replace the value associated with the variable `dataset_name` to the name you would like your dataset to appear as in your workbench. Please avoid any spaces in your dataset name - you can replace any spaces in your name with an underscore (`_`).

In [1]:
# datset_name = input("putnamehere")
dataset_name = 'Halfmann_Abortion' # Name of your dataset

## Step 2 - Select Your Dataset ##
Please select from the dropdown menu that appears after running the following cell.

In [2]:
import ipywidgets as widgets
import boto3

# We start a boto3 client that interacts with ProQuest's servers
s3 = boto3.client('s3')
bucket = 'pq-tdm-studio'
prefix = 'tdm-ale-data/232/corpus/'
response = s3.list_objects_v2(
    Bucket=bucket,
    Prefix=prefix,
    Delimiter='/')

# Getting your datasets
datasets = []
for CommonPrefix in response.get('CommonPrefixes'):
    dataset = CommonPrefix['Prefix'].replace(prefix, '').replace('/', '')
    datasets.append(dataset)
    
#Displaying your datasets in a dropdown
select = widgets.Dropdown(
    options = datasets,
    description = 'Datasets:'
)
display(select)

# Please select the dataset you want to transfer in the dropdown below

Dropdown(description='Datasets:', options=('HalfmannADHD', 'HalfmannAbortion', 'PoliticalnewsApril91865toDec31…

## Step 3 - Transfer Your Data ##
To transfer the data onto your notebook, please run the following command, and wait for it to finish. You will be able to find your data under `/data/dataset_name`, with `dataset_name` being the name of the dataset you transferred.

In [3]:
URL_of_dataset = prefix + select.value

In [4]:
URL_of_dataset

'tdm-ale-data/232/corpus/HalfmannADHD'

In [4]:
%%capture
!aws s3 cp --recursive s3://pq-tdm-studio/$URL_of_dataset/ data/$dataset_name

Once your data has been transferred, feel free to repurpose the commands found in this notebook to transfer more datasets you create! You can also navigate to the folder named `ProQuest TDM Studio Examples` to find sample scripts in which to interact with your data. Our manual and more detailed instructions/guides can be found under the folder `ProQuest TDM Studio Manual`.