In [1]:
import requests
import csv

In [2]:
###Create string name list from avaliable fields: http://www.rcsb.org/pdb/results/reportField.do
###Custom Report Web Services General info: http://www.rcsb.org/pdb/software/wsreport.do
se= "ndbId"
string_names = "classification,experimentalTechnique,macromoleculeType,residueCount,resolution,"+\
                "structureMolecularWeight,"+\
                "crystallizationMethod,crystallizationTempK,densityMatthews,densityPercentSol,"+\
                "pdbxDetails,phValue,publicationYear"
            
sequences_string_names = "sequence,residueCount,macromoleculeType"

`requests.get()` sends a GET request.

params: is an optional kwarg. It is a Dictionary, list of tuples or bytes to send in the query string for the :class:`Request`.

In [4]:
#Main Pull
payload = {'pdbids': '*','service': 'wsfile', 'format': 'csv', 'primaryOnly': '1', 'CustomReportColumns':string_names}
r = requests.get('http://www.rcsb.org/pdb/rest/customReport', params=payload)

As we can see here, our `params:` dictionary has been appened on to the url with the appropriate formatting for the URL to work.

In [5]:
r.url

'http://www.rcsb.org/pdb/rest/customReport?pdbids=%2A&service=wsfile&format=csv&primaryOnly=1&CustomReportColumns=classification%2CexperimentalTechnique%2CmacromoleculeType%2CresidueCount%2Cresolution%2CstructureMolecularWeight%2CcrystallizationMethod%2CcrystallizationTempK%2CdensityMatthews%2CdensityPercentSol%2CpdbxDetails%2CphValue%2CpublicationYear'

The .text property of our `requests.models.Response` object returns the content of the response, in unicode.

We use the `.splitlines()` method to return a list of the lines in the string, breaking at line boundaries. We then index by 0 to return only the first line.

In [10]:
r.text.splitlines()[0]

'structureId,classification,experimentalTechnique,macromoleculeType,residueCount,resolution,structureMolecularWeight,crystallizationMethod,crystallizationTempK,densityMatthews,densityPercentSol,pdbxDetails,phValue,publicationYear'

Using Pythons string method `.split(sep)` we return a list of the words in the string, using sep as the delimiter string.

In [12]:
string_names.split(",")

['classification',
 'experimentalTechnique',
 'macromoleculeType',
 'residueCount',
 'resolution',
 'structureMolecularWeight',
 'crystallizationMethod',
 'crystallizationTempK',
 'densityMatthews',
 'densityPercentSol',
 'pdbxDetails',
 'phValue',
 'publicationYear']

In [16]:
output_reader = csv.reader(r.text.splitlines())

The csv module's `.reader(iterable)` method returns an object which is an iterator.  Each iteration returns a row of the CSV file (which can span multiple input lines).

csv's `.writer()` method works as: for row in sequence: csv_writer.writerow(row)

`.writerow(iterable)` constructs and writes a CSV record from an iterable of fields.  Non-string elements will be converted to string.

In [17]:
#writing the main pull
output_reader = csv.reader(r.text.splitlines())
with open('pdb_data_no_dups.csv', 'w') as csvfile:
    csv_writer = csv.writer(csvfile)
    for row in output_reader:
        csv_writer.writerow(row)

In [18]:
len(r.text.splitlines())

162351

In [19]:
#sequence pull
payload_seq = {'pdbids': '*','service': 'wsfile', 'format': 'csv', 'primaryOnly': '1', 'CustomReportColumns':sequences_string_names}
r_seq = requests.get('http://www.rcsb.org/pdb/rest/customReport', params=payload_seq)

In [25]:
#write sequence pull
output_reader_seq = csv.reader(r_seq.text.splitlines())
with open('pdb_data_seq.csv', 'w') as csvfile:
    csv_writer_seq = csv.writer(csvfile)
    for row in output_reader_seq:
        csv_writer_seq.writerow(row)

reports = "StructureSummary,Sequence,Ligands,BindingAffinity,BiologicalDetails,ClusterEntity,"+\
    "Domains,Crystallization,UnitCellDimensions,DataCollectionDetails,RefinementDetails"+\
    "refinementParameters,NmrSoftware,NmrSpectrometer,NMRExperimentalSampleConditions,NmrRepresentative"+\
    "NMRRefinement,NmrEnsemble,EMStructure,Citation,OtherCitations,SGProject"

payload_all = {'pdbids': '*','service': 'wsfile', 'format': 'csv', 'primaryOnly': '1', 'reportName':reports}
r_all = requests.get('http://www.rcsb.org/pdb/rest/customReport', params=payload_all)
output_reader_all = csv.reader(r_all.text.splitlines())
with open('pdb_data_all.csv', 'wb') as csvfile:
    csv_writer = csv.writer(csvfile)
    for row in output_reader:
        csv_writer.writerow(row)

r_all.url