New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GigaDB API #27

Open
pli888 opened this Issue May 17, 2016 · 17 comments

Comments

Projects
None yet
5 participants
@pli888
Member

pli888 commented May 17, 2016

I have merged the API code into the develop branch but I do not know how to see it working in the Vagrant VM. For example, I get an error message when I point my web browser at:

http://127.0.0.1:9170/api/dataset/100005

@jessesiu can you help please?

@pli888

This comment has been minimized.

Show comment
Hide comment
@pli888

pli888 May 19, 2016

Member

I'm not sure what I did wrong before but I managed to get the API working.

When a user enters an API URL which cannot return any results, I think the API should return an empty string. At the moment, the API returns a GigaDB webpage with an error 500 Trying to get property of non-object message.

Member

pli888 commented May 19, 2016

I'm not sure what I did wrong before but I managed to get the API working.

When a user enters an API URL which cannot return any results, I think the API should return an empty string. At the moment, the API returns a GigaDB webpage with an error 500 Trying to get property of non-object message.

@only1chunts only1chunts added the API label May 20, 2016

@only1chunts

This comment has been minimized.

Show comment
Hide comment
@only1chunts
Member

only1chunts commented Oct 11, 2016

@only1chunts

This comment has been minimized.

Show comment
Hide comment
@only1chunts

only1chunts Oct 11, 2016

Member

I've been testing the API and noticed these points:
1 -
http://54.179.189.148/api/search?keyword=outbreak
gives 2 datasets and correctly shows the same output as
http://54.179.189.148/api/search?keyword=outbreak&result=dataset
however, when you do result=sample
http://54.179.189.148/api/search?keyword=outbreak&result=sample
you get 2 samples returned, which is incorrect, it should give the same result as the websearch:
http://54.179.189.148/search/new?keyword=outbreak&type%5B%5D=sample
i.e. there are NO samples in the database that contain the keyword "outbreak" so there shouldn't be any results returned.

2-
Similarly to (1) the results are incorrect for the search
http://54.179.189.148/api/search?keyword=chimp
this only returns the top level dataset information, when there are several samples and even files that contain the word "chimp" and should be displayed.

3 -
When just the sample results (or File or experiment) are being returned it should be possible for the user to immeadiately link them back to their DOI's.
Add attribute "dataset_DOI=xxxxxx" somewhere in the sample XML- can it be added here:
sample submission_date="2013-04-08" id="527"
sample submission_date="2013-04-08" id="527" doi="100051"

4 - ERROR
When I search for anything asking for &result=file I get an ugly error message:
http://54.179.189.148/api/search?keyword=chimp&result=file

This page contains the following errors:
error on line 31 at column 8: Opening and ending tag mismatch: meta line 0 and head

Member

only1chunts commented Oct 11, 2016

I've been testing the API and noticed these points:
1 -
http://54.179.189.148/api/search?keyword=outbreak
gives 2 datasets and correctly shows the same output as
http://54.179.189.148/api/search?keyword=outbreak&result=dataset
however, when you do result=sample
http://54.179.189.148/api/search?keyword=outbreak&result=sample
you get 2 samples returned, which is incorrect, it should give the same result as the websearch:
http://54.179.189.148/search/new?keyword=outbreak&type%5B%5D=sample
i.e. there are NO samples in the database that contain the keyword "outbreak" so there shouldn't be any results returned.

2-
Similarly to (1) the results are incorrect for the search
http://54.179.189.148/api/search?keyword=chimp
this only returns the top level dataset information, when there are several samples and even files that contain the word "chimp" and should be displayed.

3 -
When just the sample results (or File or experiment) are being returned it should be possible for the user to immeadiately link them back to their DOI's.
Add attribute "dataset_DOI=xxxxxx" somewhere in the sample XML- can it be added here:
sample submission_date="2013-04-08" id="527"
sample submission_date="2013-04-08" id="527" doi="100051"

4 - ERROR
When I search for anything asking for &result=file I get an ugly error message:
http://54.179.189.148/api/search?keyword=chimp&result=file

This page contains the following errors:
error on line 31 at column 8: Opening and ending tag mismatch: meta line 0 and head

@jessesiu

This comment has been minimized.

Show comment
Hide comment
@jessesiu

jessesiu Oct 12, 2016

Contributor
  1. The keyword search result sample, as most of keyword search will return lots of data, the result parameter is for just return part of dataset. eg. dataset info, file info, sample info.

http://54.179.189.148/api/search?keyword=outbreak&result=sample

Its will return the samples from the datasets which contain keyword outbreak.

If users want to search sample, he can use the taxno, or taxname, it will more accurate.
e.g.
http://54.179.189.148/api/search?taxno=108931&result=sample

http://54.179.189.148/api/search?keyword=chimp&result=sample
http://54.179.189.148/api/search?keyword=chimp&result=file

Added this attribute

Fixed this error

Contributor

jessesiu commented Oct 12, 2016

  1. The keyword search result sample, as most of keyword search will return lots of data, the result parameter is for just return part of dataset. eg. dataset info, file info, sample info.

http://54.179.189.148/api/search?keyword=outbreak&result=sample

Its will return the samples from the datasets which contain keyword outbreak.

If users want to search sample, he can use the taxno, or taxname, it will more accurate.
e.g.
http://54.179.189.148/api/search?taxno=108931&result=sample

http://54.179.189.148/api/search?keyword=chimp&result=sample
http://54.179.189.148/api/search?keyword=chimp&result=file

Added this attribute

Fixed this error

@pli888

This comment has been minimized.

Show comment
Hide comment
@pli888

pli888 Oct 12, 2016

Member

Is there any documentation on how to use the API? This should be available on a web page from the Help button but clicking it generates an error message:

require(Yii::app()->basePath/../files/html/help.html): failed to open stream: Permission denied

Member

pli888 commented Oct 12, 2016

Is there any documentation on how to use the API? This should be available on a web page from the Help button but clicking it generates an error message:

require(Yii::app()->basePath/../files/html/help.html): failed to open stream: Permission denied

@jessesiu

This comment has been minimized.

Show comment
Hide comment
@jessesiu

jessesiu Oct 12, 2016

Contributor

Updated the document in the help page

Contributor

jessesiu commented Oct 12, 2016

Updated the document in the help page

@only1chunts

This comment has been minimized.

Show comment
Hide comment
@only1chunts

only1chunts Oct 12, 2016

Member

@jessesiu regarding (1) above, the point is there are NO samples that contain the keyword "outbreak" yet the sample search in the API returns 2 results!
I believe the search
http://54.179.189.148/api/search?keyword=outbreak&result=sample
should give the same results as
http://54.179.189.148/search/new?keyword=outbreak&type%5B%5D=sample

i.e. they are the same search done via API and website, currently they give different results.

Member

only1chunts commented Oct 12, 2016

@jessesiu regarding (1) above, the point is there are NO samples that contain the keyword "outbreak" yet the sample search in the API returns 2 results!
I believe the search
http://54.179.189.148/api/search?keyword=outbreak&result=sample
should give the same results as
http://54.179.189.148/search/new?keyword=outbreak&type%5B%5D=sample

i.e. they are the same search done via API and website, currently they give different results.

@jessesiu

This comment has been minimized.

Show comment
Hide comment
@jessesiu

jessesiu Oct 12, 2016

Contributor

The result parameter is for return different parts info of dataset.
I just add the type (array type) parameter, same as the website search to search keyword in different area, default search is in dataset.
http://54.179.189.148/api/search?keyword=outbreak&type[]=sample
If want to search keyword in sample and file
http://54.179.189.148/api/search?keyword=outbreak&type[]=sample&type[]=file

Contributor

jessesiu commented Oct 12, 2016

The result parameter is for return different parts info of dataset.
I just add the type (array type) parameter, same as the website search to search keyword in different area, default search is in dataset.
http://54.179.189.148/api/search?keyword=outbreak&type[]=sample
If want to search keyword in sample and file
http://54.179.189.148/api/search?keyword=outbreak&type[]=sample&type[]=file

@only1chunts

This comment has been minimized.

Show comment
Hide comment
@only1chunts

only1chunts Oct 13, 2016

Member

After conversations with Jesse he has now:

Now it will search in top level dataset, sample, file using the parameter result.
http://54.179.189.148/api/search?keyword=outbreak&result=sample
it will return error 404 Not Found and show;
"No items where found for keyword outbreak in sample, Please search in dataset or file."
(http://54.179.189.148/api/search?keyword=outbreak or http://54.179.189.148/api/search?keyword=outbreak&result=dataset)

will return two records

This behavior is now consistent with the web-search. So points 1, 3 & 4 are all now dealt with.

Member

only1chunts commented Oct 13, 2016

After conversations with Jesse he has now:

Now it will search in top level dataset, sample, file using the parameter result.
http://54.179.189.148/api/search?keyword=outbreak&result=sample
it will return error 404 Not Found and show;
"No items where found for keyword outbreak in sample, Please search in dataset or file."
(http://54.179.189.148/api/search?keyword=outbreak or http://54.179.189.148/api/search?keyword=outbreak&result=dataset)

will return two records

This behavior is now consistent with the web-search. So points 1, 3 & 4 are all now dealt with.

@only1chunts

This comment has been minimized.

Show comment
Hide comment
@only1chunts

only1chunts Oct 13, 2016

Member

point 2, from above;
2-
Similarly to (1) the results are incorrect for the search
http://54.179.189.148/api/search?keyword=chimp
this only returns the top level dataset information, when there are several samples and even files that contain the word "chimp" and should be displayed.

This search still only returns "dataset" results, i.e. giving the same results as:
http://54.179.189.148/api/search?keyword=chimp&result=dataset
it should be returning all dataset, sample and file results, see the websearch for chimp:
http://54.179.189.148/search/new?keyword=chimp&yt0=Search

Member

only1chunts commented Oct 13, 2016

point 2, from above;
2-
Similarly to (1) the results are incorrect for the search
http://54.179.189.148/api/search?keyword=chimp
this only returns the top level dataset information, when there are several samples and even files that contain the word "chimp" and should be displayed.

This search still only returns "dataset" results, i.e. giving the same results as:
http://54.179.189.148/api/search?keyword=chimp&result=dataset
it should be returning all dataset, sample and file results, see the websearch for chimp:
http://54.179.189.148/search/new?keyword=chimp&yt0=Search

@jessesiu

This comment has been minimized.

Show comment
Hide comment
@jessesiu

jessesiu Oct 13, 2016

Contributor

OK, It will return all dataset, sample and file results now

Contributor

jessesiu commented Oct 13, 2016

OK, It will return all dataset, sample and file results now

@only1chunts

This comment has been minimized.

Show comment
Hide comment
@only1chunts

only1chunts Oct 13, 2016

Member

That has now allowed me to spot another minor issue about the XML structure.

Each <GigaDB_entry> should be a separate dataset, i.e. groups the dataset,samples, files and experiments from a single dataset into one chunk.

So for the chimp example you would have 2 distinct <gigadb_entry> sections, under the top level <gigadb_entries> root

Member

only1chunts commented Oct 13, 2016

That has now allowed me to spot another minor issue about the XML structure.

Each <GigaDB_entry> should be a separate dataset, i.e. groups the dataset,samples, files and experiments from a single dataset into one chunk.

So for the chimp example you would have 2 distinct <gigadb_entry> sections, under the top level <gigadb_entries> root

@jgrethe

This comment has been minimized.

Show comment
Hide comment
@jgrethe

jgrethe Oct 13, 2016

One quick question about the API. Is there a way to "crawl" the API to extract metadata for all datasets so that they can be indexed by external services?

jgrethe commented Oct 13, 2016

One quick question about the API. Is there a way to "crawl" the API to extract metadata for all datasets so that they can be indexed by external services?

@only1chunts

This comment has been minimized.

Show comment
Hide comment
@only1chunts

only1chunts Oct 14, 2016

Member

Hi @jgrethe , at present the search functionality could return all data by using a term present in every dataset, however it would probably time out. We are now looking into providing a full dump of all data as a precomputed, daily(maybe weekly?) updated, file for easy download. This will provide an easy option for the first scrape of everything, after that it could be possible to either pull it all again at a later date and do a "diff", or use the API to get only recent dataset from whatever date the full dump was done.
Thanks for your interest.

Member

only1chunts commented Oct 14, 2016

Hi @jgrethe , at present the search functionality could return all data by using a term present in every dataset, however it would probably time out. We are now looking into providing a full dump of all data as a precomputed, daily(maybe weekly?) updated, file for easy download. This will provide an easy option for the first scrape of everything, after that it could be possible to either pull it all again at a later date and do a "diff", or use the API to get only recent dataset from whatever date the full dump was done.
Thanks for your interest.

@jgrethe

This comment has been minimized.

Show comment
Hide comment
@jgrethe

jgrethe Oct 14, 2016

Sounds great! A weekly dump would be an easy way to grab the information.

jgrethe commented Oct 14, 2016

Sounds great! A weekly dump would be an easy way to grab the information.

@DennisSchwartz

This comment has been minimized.

Show comment
Hide comment
@DennisSchwartz

DennisSchwartz Oct 18, 2016

@only1chunts I'm also interested in the metadata. Do you have an idea about the format of the data and where it will be made available? :) Thanks.

Also, this query http://54.179.189.148/api/search?taxno=9606&datasettype=Genomic returns an error. Is there a place to report these kinds of bugs? (Or am I doing sth wrong?)

DennisSchwartz commented Oct 18, 2016

@only1chunts I'm also interested in the metadata. Do you have an idea about the format of the data and where it will be made available? :) Thanks.

Also, this query http://54.179.189.148/api/search?taxno=9606&datasettype=Genomic returns an error. Is there a place to report these kinds of bugs? (Or am I doing sth wrong?)

@only1chunts

This comment has been minimized.

Show comment
Hide comment
@only1chunts

only1chunts Oct 18, 2016

Member

@DennisSchwartz The format will likely be the XML that the API currently outputs, but bare in mind this is still beta so there maybe some minor changes required before final release. The details of the XML format can be found here.
We haven't yet decided exactly where the weekly dump will be kept.
Q. Would you prefer an API style call to use e.g. something like:
http://dev.gigadb.org/api/dataset?doi=all
or direct download from our ftp server?
(in all honesty I think we would probably just redirect the above call to the FTP file anyway, so I guess it could be both)

Member

only1chunts commented Oct 18, 2016

@DennisSchwartz The format will likely be the XML that the API currently outputs, but bare in mind this is still beta so there maybe some minor changes required before final release. The details of the XML format can be found here.
We haven't yet decided exactly where the weekly dump will be kept.
Q. Would you prefer an API style call to use e.g. something like:
http://dev.gigadb.org/api/dataset?doi=all
or direct download from our ftp server?
(in all honesty I think we would probably just redirect the above call to the FTP file anyway, so I guess it could be both)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment