New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCSCSession objects breaking possibly because of some change on the UCSC side #113
Comments
Thanks for @hpages reporting, library(rtracklayer)
# get cookie from https://genome-euro.ucsc.edu/cgi-bin/hgGateway
session <- browserSession()
# make a request to https://genome-euro.ucsc.edu/cgi-bin/hgTracks with previously obtained cookie
# however now the site is not functional without JS support
tracks <- rtracklayer:::ucscGet(session, "tracks", list())
caveat: Track names could be extract with Table browser, though track mode information is only present on the genome browser. @lawremi, what is your thoughts on this? <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Security-Policy" content="default-src *; script-src 'self' blob: 'unsafe-inline' 'nonce-13incR3wfN2P97dHQvJzksSV2il0' code.jquery.com/jquery-1.9.1.min.js code.jquery.com/jquery-1.12.3.min.js code.jquery.com/ui/1.10.3/jquery-ui.min.js code.jquery.com/ui/1.11.0/jquery-ui.min.js code.jquery.com/ui/1.12.1/jquery-ui.js www.google-analytics.com/analytics.js www.googletagmanager.com/gtag/js www.samsarin.com/project/dagre-d3/latest/dagre-d3.js cdnjs.cloudflare.com/ajax/libs/bowser/1.6.1/bowser.min.js
cdnjs.cloudflare.com/ajax/libs/d3/3.4.4/d3.min.js cdnjs.cloudflare.com/ajax/libs/jquery/1.12.1/jquery.min.js cdnjs.cloudflare.com/ajax/libs/jstree/3.2.1/jstree.min.js cdnjs.cloudflare.com/ajax/libs/jstree/3.3.4/jstree.min.js cdnjs.cloudflare.com/ajax/libs/jstree/3.3.7/jstree.min.js cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js login.persona.org/include.js ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.js maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js d3js.org/d3.v3.min.js cdn.datatables.net/1.10.12/js/jquery.dataTables.min.js cdn.jsdelivr.net/npm/shepherd.js@11.0.1/dist/js/shepherd.min.js www.google.com/recaptcha/api.js; style-src * 'unsafe-inline'; font-src * data:; img-src * data:;">
<title>Human hg38 chr7:155,799,529-155,812,871 UCSC Genome Browser v461</title>
<meta http-equiv="Content-Script-Type" content="text/javascript">
<link rel="stylesheet" href="../style/HGStyle.css?v=1708368144" type="text/css">
<script async src="https://www.googletagmanager.com/gtag/js?id=G-G5K9F3K9H2"></script>
</head>
<body class="hgTracks cgi">
<center><div id="warnBox" style="display:none;">
<center><b id="warnHead"></b></center>
<ul id="warnList"></ul>
<center><button id="warnOK"></button></center>
</div></center>
<noscript><div class="noscript"><div class="noscript-inner">
<p><b>JavaScript is disabled in your web browser</b></p>
<p>You must have JavaScript enabled in your web browser to use the Genome Browser</p>
</div></div></noscript>
<script type="text/javascript" src="../js/jquery.js?v=1708368145"></script><script type="text/javascript" src="../js/utils.js?v=1708368145"></script><script type="text/javascript" nonce="13incR3wfN2P97dHQvJzksSV2il0">
function showWarnBox() {document.getElementById('warnOK').innerHTML=' OK ';var warnBox=document.getElementById('warnBox');warnBox.style.display='';document.getElementById('warnHead').innerHTML='Warning/Error(s):';window.scrollTo(0, 0);}
function hideWarnBox() {var warnBox=document.getElementById('warnBox');warnBox.style.display='none';var warnList=document.getElementById('warnList');warnList.innerHTML='';var endOfPage = document.body.innerHTML.substr(document.body.innerHTML.length-20);if(endOfPage.lastIndexOf('-- ERROR --') > 0) { history.back(); }}
document.getElementById('warnOK').onclick = function() {hideWarnBox();return false;};
window.onunload = function(){}; // Trick to avoid FF back button issue.
addPixAndReloadPage();// Google tag load (gtag.js)
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date()); gtag('config', 'G-G5K9F3K9H2');
// Google tag load end
$(document).ready(function() {
if (gtag) {
/* send db to ga4 as an event on page load */
gtag('event', 'hgTracksLoad', {'db': getDb()})
};
});</script>
</body>
</html>
<html><link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"></html>
|
Hi @maximilianh, Is it possible to work directly with HTML content and avoid parsing JS (For https://genome-euro.ucsc.edu/cgi-bin/hgTracks)? Can please you look into this? I hope we can find some fix for it if it is possible. |
Can you give me a little more context? Our site has been requiring JS for at least 15 years, that hasn't changed. What I changed is that I added code to detects if the "pix" session or URL variable (screensize) is not set and if it's not set it determines the screen size, then reloads the page. I have no idea why this would intefere with rtracklayer, but it's something that has changed recently. Maybe some other change broke the rtracklayer parser, idk, I don't have enough information yet to make an educated guess. To get the list of tracks in a way that doesn't require parsing HTML, we have the "tracks" API endpoint, e.g. http://api.genome.ucsc.edu/list/tracks?genome=hg38 see api.genome.ucsc.edu for more documentation or feel free to ask me. |
If you can tell me what exactly broke rtracklayer, I can try to do something to make it work again in the sort run, but in the long run it would probably reduce the number of firedrills to start parsing JSON rather than HTML :-) |
@sanchit-saini You wrote "track mode information", what do you mean with "track mode" ? Do you mean the visibilities? If this is something that the API doesn't return, we will add the information ASAP to the API. I wonder if there is a reason why you are not using the API, if that's the case, we will absolutely have to fix that. |
Sorry for the late reply. Why rtracklayer need to scrape/parse HTML?rtracklayer provides a command line interface to interact with Genome Browser. Which cannot be emulated with REST API. What are track Modes?If we open https://genome-euro.ucsc.edu/cgi-bin/hgTracks we can see trackNames (e.g Assembly) and a drop with options (hide, dense, squish, pack, full). These options are refered as track modes in rtracklayer. What is the problem rtracklayer is facing?
Recently this https://genome-euro.ucsc.edu/cgi-bin/hgTracks stopped giving HTML response and needs JS to function. Request 1 $ curl -I -H "User-Agent: rtracklayer" 'https://genome-euro.ucsc.edu/cgi-bin/hgGateway' Response 1HTTP/1.1 200 OK
Date: Mon, 11 Mar 2024 14:31:40 GMT
Server: Apache/2.4.53 (Rocky Linux) OpenSSL/3.0.1
Set-Cookie: hguid.genome-euro=467621116_o5SInCvSI4NA3g2kOiGXGgtvMfad; path=/; domain=.ucsc.edu; expires=Thu, 31-Dec-2037 23:59:59 GMT
Vary: Accept-Encoding
Origin-Trial: Ats6dcpzFne+6Djws3arcMPv1F64iEOPnBrs3VjBzvGcrG+EAc1D0+uMm00BglPAQqBh5ZHPZPXHyFU+rHjxOwUAAABweyJvcmlnaW4iOiJodHRwczovL3Vjc2MuZWR1OjQ0MyIsImZlYXR1cmUiOiJBbGxvd1N5bmNYSFJJblBhZ2VEaXNtaXNzYWwiLCJleHBpcnkiOjE1OTc5NzA5MjUsImlzU3ViZG9tYWluIjp0cnVlfQ==
Content-Type: text/html; charset=UTF-8 Request 2$ curl -H "User-Agent: rtracklayer" -b "hguid.genome-euro=467621116_o5SInCvSI4NA3g2kOiGXGgtvMfad" 'https://genome-euro.ucsc.edu/cgi-bin/hgTracks' Response 2<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
<meta http-equiv='Content-Security-Policy' content="default-src *; script-src 'self' blob: 'unsafe-inline' 'nonce-eaR99FCvJjT3Qxw3ya71duJlfY2j' code.jquery.com/jquery-1.9.1.min.js code.jquery.com/jquery-1.12.3.min.js code.jquery.com/ui/1.10.3/jquery-ui.min.js code.jquery.com/ui/1.11.0/jquery-ui.min.js code.jquery.com/ui/1.12.1/jquery-ui.js www.google-analytics.com/analytics.js www.googletagmanager.com/gtag/js www.samsarin.com/project/dagre-d3/latest/dagre-d3.js cdnjs.cloudflare.com/ajax/libs/bowser/1.6.1/bowser.min.js cdnjs.cloudflare.com/ajax/libs/d3/3.4.4/d3.min.js cdnjs.cloudflare.com/ajax/libs/jquery/1.12.1/jquery.min.js cdnjs.cloudflare.com/ajax/libs/jstree/3.2.1/jstree.min.js cdnjs.cloudflare.com/ajax/libs/jstree/3.3.4/jstree.min.js cdnjs.cloudflare.com/ajax/libs/jstree/3.3.7/jstree.min.js cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js login.persona.org/include.js ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.js maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js d3js.org/d3.v3.min.js cdn.datatables.net/1.10.12/js/jquery.dataTables.min.js cdn.jsdelivr.net/npm/shepherd.js@11.0.1/dist/js/shepherd.min.js www.google.com/recaptcha/api.js; style-src * 'unsafe-inline'; font-src * data:; img-src * data:;">
<TITLE>Human hg38 chr7:155,799,529-155,812,871 UCSC Genome Browser v461</TITLE>
<META http-equiv="Content-Script-Type" content="text/javascript">
<link rel='stylesheet' href='../style/HGStyle.css?v=1708368144' type='text/css'>
<script async src="https://www.googletagmanager.com/gtag/js?id=G-G5K9F3K9H2"></script>
</HEAD>
<BODY CLASS="hgTracks cgi">
<center><div id='warnBox' style='display:none;'><CENTER><B id='warnHead'></B></CENTER><UL id='warnList'></UL><CENTER><button id='warnOK'></button></CENTER></div></center>
<noscript><div class='noscript'><div class='noscript-inner'><p><b>JavaScript is disabled in your web browser</b></p><p>You must have JavaScript enabled in your web browser to use the Genome Browser</p></div></div></noscript>
<script type='text/javascript' SRC='../js/jquery.js?v=1708368145'></script>
<script type='text/javascript' SRC='../js/utils.js?v=1708368145'></script>
<script type='text/javascript' nonce='eaR99FCvJjT3Qxw3ya71duJlfY2j'>
function showWarnBox() {document.getElementById('warnOK').innerHTML=' OK ';var warnBox=document.getElementById('warnBox');warnBox.style.display='';document.getElementById('warnHead').innerHTML='Warning/Error(s):';window.scrollTo(0, 0);}
function hideWarnBox() {var warnBox=document.getElementById('warnBox');warnBox.style.display='none';var warnList=document.getElementById('warnList');warnList.innerHTML='';var endOfPage = document.body.innerHTML.substr(document.body.innerHTML.length-20);if(endOfPage.lastIndexOf('-- ERROR --') > 0) { history.back(); }}
document.getElementById('warnOK').onclick = function() {hideWarnBox();return false;};
window.onunload = function(){}; // Trick to avoid FF back button issue.
addPixAndReloadPage();// Google tag load (gtag.js)
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date()); gtag('config', 'G-G5K9F3K9H2');
// Google tag load end
$(document).ready(function() {
if (gtag) {
/* send db to ga4 as an event on page load */
gtag('event', 'hgTracksLoad', {'db': getDb()})
};
});</script>
</BODY>
</HTML>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css"> |
Sorry, I don't understand, can you give me more context? rtracklayer needs the list of tracks in this example, for one assembly. We have an API call for that. Why is rtracklayer parsing HTML to get the list of tracks? track modes: ok great, then, we call them "visibilities" but the word doesn't matter. Thanks for explaining it.
hgTracks?hgsid=xxxx&pix=800 your code should work as before, even today (I hope) What I can do on our end with the next code release in two weeks is to suppress the javascript entirely if the user agent is "rtracklayer". I did this on my internal testing website, can you try if your code works here: |
@sanchit-saini This problem will come up again whenever we change our HTML. In our group, we don't understand instead of parsing the HTML, you cannot use an API call to get the list of track names... You PR fixes it, but older rtracklayer versions will be broken. Should I commit the fix from https://hgwdev-max.gi.ucsc.edu/cgi-bin/hgTracks and get it released in two weeks? |
Maybe @hpages has some idea on why UCSC doesn't understand @sanchit-saini 's reply? |
Thanks @maximilianh, adding
Yes, I tested it, and it seems to be working without the
Essentially, through these and other functions, we can interact with the genome browser from the command line. rtracklayer internally archives this by mimicking requests to the genome browser and parsing response HTML. I hope it is clear now why we cannot use UCSC REST API's, as this feature depends on interacting with Genome Browser, which cannot be archived with the REST API's. |
I'm not really sure I understand it either. Anyways, I've started to use UCSC REST API instead of |
Thanks @hpages!
@sanchit-saini I didn't know about browseGenome(). Thanks for the
explanations. I hope it's not too widely used, because it means that every
time we change our HTML (which unfortunately happens rarely, as we spend
most of our time chasing bugs in parsers for tab-sep files for >20
different input databases... sigh...), we will break rtracklayer and often
browseGenome().
Is there some automated test that we can run to check if we broke
rtracklayer with a change? Is there a way you could run such a test? Our
development site is genome-test.gi.ucsc.edu and that's built every night.
If you could run a daily test against that, we can catch these problems,
all problems, before they reach the public site.
… Message ID: ***@***.***>
|
I imagine that the closing of this ticket means that some automated test
(which could be as short as four-five lines) is not something that you
consider important. That's OK with me, but this topic will come up
again... :-)
We're planning to change our HTML significantly over the next two years.
… Message ID: ***@***.***
com>
|
@maximilianh Yes, this feature is not used widely. |
Yes, if you just have a few lines that test a few basic things, of the
most-used functions, so probably not browseGenome(), idk something it used
a lot, then you can run these every day or so against genome-test. I wasn't
aware that rtracklayer has no automated tests... if you can point such
tests to genome-test.gi.ucsc.edu then you will find bugs right away, before
we release them and can let us know.
…On Wed, Mar 13, 2024 at 9:33 PM Sanchit Saini ***@***.***> wrote:
@maximilianh <https://github.com/maximilianh> Yes, this feature is not
used widely.
At this moment, we don't have any tests to check it, though I can write
tests. I will try to make these tests portable so you folks can test them
on your end too.
—
Reply to this email directly, view it on GitHub
<#113 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TNGHLO6TIAOXGWQRGTYYCZ2NAVCNFSM6AAAAABECHODRGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVG4YDMNRYGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It has tests for non network related features. Now, we will also add tests for the missing features and try to set up a GitHub action or some sort of automation to run those tests periodically. Also, you can expect to the tests to be completed around the end of this month. |
Great!! This sounds like a great idea to make rtracklayer more stable and
also helps with us worrying about breaking other software when we touch our
site.
…On Sat, Mar 16, 2024 at 2:11 PM Sanchit Saini ***@***.***> wrote:
It has tests for non network related features. Now, we will also add tests
for the missing features and try to set up a GitHub action or some sort of
automation to run those tests periodically.
Also, you can expect to the tests to be completed around the end of this
month.
—
Reply to this email directly, view it on GitHub
<#113 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TINWVBJJ5VLZC7I4ELYYRAGLAVCNFSM6AAAAABECHODRGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBRHE4DGMBXGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
hi there, I don't really understand the thread details (sorry!), but I am interested in the bottom line. From an ordinary user perspective, is I've been running into similar errors as the ones that Herve reported at the top of the thread thanks! Janet |
Hi @maximilianh, it took a bit longer than I expected. I have created PR #120, which covers most of the commonly used functions. A few functions are missing, and I will add tests for them soon. |
Hi @jayoung We are constantly trying to maintain stability and add features to the rtracklayer package. If there's a change on the UCSC side, the browseGenome() function will break occasionally because it's implemented using miming requests and parsing responses (aka web scarping). That was also the case for this issue. However, to avoid these kinds of problems in the future, we have put some test cases in place, which will help us know if something went wrong so we can fix it immediately. For plots, you have to observe what your use case is and which package comes close to solving it. |
Looks like maybe something has changed on the UCSC side a few days ago that breaks UCSCSession objects:
This is in release (rtracklayer 1.62.0, BioC 3.18) and devel (rtracklayer 1.63.1, BioC 3.19).
This breaks packages customProDB, GenomicFeatures, and goseq on all platforms in release and devel:
H.
The text was updated successfully, but these errors were encountered: