Feature python3 support lsh tweaks #364

lsh-0 · 2017-11-29T05:16:16Z

... ok.

the first and most important point: I have the tests running for python3, so thanks to @seanwiseman for all the hard work.

but also:

updated fixtures
updated api-raml
updated install script so it uses python3
the scripts (scrape-article, scrape-random, etc) are now working again
update-api-raml no longer dies when it tries to change back to the previous dir (if you're using a symlink to a shared api-raml dir like me)

some controversial edits:

removed support for python2. bot-lax isn't a library and it looks like it's depending on a python3 branch of elife-tools (or is elife-tools completely py2/3 compatible now?).
the import src.foo.bar as bar really irk me. I've replaced them with from src.foo import bar type imports.
while src is technically a package, it's not really used as such in the application. The exception is in the tests when they need to 'reach back' and reference modules in the parent directory.
removed ~~backported~~ mock in favour of python3's builtin
mock.patch paths were behaving unusually and causing some really strange bugs except in one instance, which is unsettling: https://github.com/elifesciences/bot-lax-adaptor/compare/feature-python3-support...feature-python3-support-lsh-tweaks?expand=1#diff-5e0b291ba4f1474709fc8d4bcf4805d7R357

I haven't gone through the original PR too closely yet.

we were cloning from source for a good reason, I'd need to revisit notes

removed tox file, no reason to support python2 in this project.

lsh-0 · 2017-11-29T05:34:40Z

ok, I see there is an issue unpickling py2 objects in py3: requests-cache/requests-cache#83

I'll see if I can't convert it

lsh-0 · 2017-11-29T06:06:54Z

problem can be replicated with this:

import sqlite3
c = sqlite3.connect('cache/requests-cache.sqlite3')
cur = c.cursor()
results = cur.execute('select value from responses limit 1').fetchone()
value = results[0] # first value of only row

import pickle
pickle.loads(value, encoding='utf-8')

hrm, I can get the value to unpickle in python 3 by changing it to:

import _dummy_thread as dummy_thread

import sqlite3
c = sqlite3.connect('cache/requests-cache.sqlite3')
cur = c.cursor()
results = cur.execute('select value from responses limit 1').fetchone()
value = results[0]

import pickle
value=pickle.loads(value, encoding='bytes')

but there are so many broken references after that. we should think about rebuilding the cache.

I'll change the cache db path to include the version of python or leave unchanged if python 2

lsh-0 · 2017-11-29T06:46:44Z

ha - er. the moment I push this change it's probably going to thrash iiif once it gets past the app tests and I'm signing off in a moment. Attaching the change as a patch.

patch.diff.txt

seanwiseman · 2017-11-29T08:44:08Z

Nice work @lsh-0 . The elife-tools commit it is using is fully 2/3 compatible. There were a few import changes I had to make to get it to work when imported as a dependency. I will now tidy up that PR and get it into the develop branch, once done I will update the requirements here.

giorgiosironi · 2017-11-29T15:09:27Z

We could boot iiif--ci (4 servers), but we don't want to do that for every build. Down the road we could hypothesize to deploy several containers to a shared infrastructure to deal with the build.

So I think I'll turn on iiif--ci, lock it, apply the patch along with a URL to build the cache, then revert it.

giorgiosironi · 2017-11-29T15:16:01Z

Nope, won't work as when we switch back the URL will be different.

giorgiosironi · 2017-11-29T15:19:12Z

Tweak

bot-lax-adaptor/src/generate_article_json.py

Line 35 in c1289e5

Parallel(n_jobs=-1)(delayed(render)(path, json_output_dir) for path in paths)

to use just 2/4 processes, wait for a very long build, turn it back to the original?

lsh-0 · 2017-11-30T23:11:23Z

that would definitely slow the huge number of requests made to iiif, but each article still does N requests, so it would be a steady stream instead of a waterfall.

I'll do as you suggest though and keep an eye on the iiif server. We do have a ticket open somewhere to ensure iiif can survive a backfill

- we're going to be stuck with it from hereonout, so lets not suffix it with 'py3' reduced the number of processes to use while generating article-json so we don't flood iiif. this change should be reverted once cache is rebuilt.

lsh-0 · 2017-12-01T04:47:59Z

ok - dropped the number of processes to use down to 2 and changed the cache name, it shouldn't thrash iiif now but I'll keep an eye on the alerts

lsh-0 · 2017-12-01T05:05:58Z

ah - green is a great little tool, but it's behaviour on encountering import errors is to swallow it quietly and report 'no tests found'. It looks like a chunk of tests with import src.foo.bar style imports were not being run. Fixing them up now

lsh-0 · 2017-12-01T06:18:30Z

cool - it's going through the corpus now and I can see all the cache misses

lsh-0 · 2017-12-01T06:19:38Z

(would love to know where this is originating from:)

/ext/jenkins-libraries-runner/workspace/or_PR-364-CEJKMGU523MRCQW4DFGD/venv/lib/python3.5/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available XML parser for this system ("lxml-xml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 50 of the file src/generate_article_json.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml-xml")

afaict elife-tools is doing the right thing, and it only instantiates bs4 once

giorgiosironi · 2017-12-01T10:06:06Z

https://elifesciences.atlassian.net/browse/ELPP-2973 is the ticket.

gnott · 2017-12-01T17:13:29Z

The warning is from https://github.com/elifesciences/elife-tools/blob/develop/elifetools/parseJATS.py#L15

    return BeautifulSoup(xml, ["lxml", "xml"])

should be

    return BeautifulSoup(xml, "lxml-xml")

I believe it is like this after the BeautifulSoup version used was incremented. I'll see if it works and arrange a PR on the elife-tools project.

lsh-0 · 2017-12-03T23:25:14Z

updating requirements.txt to clone elifetools to an editable repo and changing that line (which I thought was still valid anyway) to what they requested didn't suppress the warning for me. Maybe I did something wrong.

lsh-0 · 2017-12-03T23:26:24Z

build failed with:


INFO - 2017-12-01 08:13:53,806 - Requesting url https://prod--iiif.elifesciences.org/lax:13964/elife-13964-fig4-v3.tif/info.json (cache key '5fcd4b882d0d2ef2e6850167a0aa6b12a76e546a2ed91ef55b56875045ba0aee') -- {"stack_info": null}
/ext/jenkins-libraries-runner/workspace/or_PR-364-CEJKMGU523MRCQW4DFGD/venv/lib/python3.5/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available XML parser for this system ("lxml-xml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 882 of the file /usr/lib/python3.5/threading.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml-xml")

  markup_type=markup_type))
INFO - 2017-12-01 08:13:53,982 - elife-13943-v2.xml -> elife-13943-v2.xml.json => success -- {"stack_info": null}
/ext/jenkins-libraries-runner/workspace/or_PR-364-CEJKMGU523MRCQW4DFGD/venv/lib/python3.5/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available XML parser for this system ("lxml-xml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 882 of the file /usr/lib/python3.5/threading.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml-xml")

  markup_type=markup_type))
INFO - 2017-12-01 08:13:54,036 - elife-13943-v1.xml -> elife-13943-v1.xml.json => success -- {"stack_info": null}
Terminated
script returned exit code 143

investigating

lsh-0 · 2017-12-04T03:24:54Z

this failure:

INFO - 2017-12-04 03:11:29,953 - Requesting url https://prod--iiif.elifesciences.org/lax:03043/elife-03043-fig7-v1.tif/info.json (cache key 'c3e4b1e37b2dc2949011c289cfda6f1364d0364577287354eb985c7cb0221c24') -- {"stack_info": null}
INFO - 2017-12-04 03:11:30,179 - elife-03043-v1.xml -> elife-03043-v1.xml.json => success -- {"stack_info": null}
/ext/jenkins-libraries-runner/workspace/or_PR-364-CEJKMGU523MRCQW4DFGD/venv/lib/python3.5/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available XML parser for this system ("lxml-xml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 50 of the file src/generate_article_json.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml-xml")

  markup_type=markup_type))
INFO - 2017-12-04 03:11:30,275 - elife-03035-v1.xml -> elife-03035-v1.xml.json => success -- {"stack_info": null}
Sending interrupt signal to process
/ext/jenkins-libraries-runner/workspace/or_PR-364-CEJKMGU523MRCQW4DFGD/venv/lib/python3.5/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available XML parser for this system ("lxml-xml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 882 of the file /usr/lib/python3.5/threading.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml-xml")

  markup_type=markup_type))
INFO - 2017-12-04 03:11:30,906 - elife-03032-v1.xml -> elife-03032-v1.xml.json => success -- {"stack_info": null}
Terminated
script returned exit code 143

lsh-0 · 2017-12-04T03:26:46Z

@giorgiosironi , is there a timeout on the builds? I see this Sending interrupt signal to process this time.

giorgiosironi · 2017-12-04T09:37:20Z

Yes, there is a default timeout of 2 hours. I'll override this.

gnott · 2017-12-04T18:29:09Z

elifesciences/elife-tools#256 in elife-tools will use "lxml-xml", and hopefully it will get rid of the warning (once merged and integrated).

lsh-0 · 2017-12-04T23:01:45Z

excellent, thanks both. I'll integrate your change @gnott and push it up to trigger the rebuild. ~~I think it's about 2/3 of the way through the corpus before it starts getting cache misses~~ I see it got rebuilt on Giorgio's change to the jenkins file.

this should suppress the warning about the parser being used

Luke Skibinski added 6 commits November 29, 2017 12:10

updated api-raml version

65f95b4

install.sh now uses python3.5 for it's venv

709015c

fixes issue with connexion breaking installation

376cb3c

we were cloning from source for a good reason, I'd need to revisit notes

update-api-raml.sh now handles case where api-raml dir is a symlink

fd02488

remove unnecessary and elaborate imports

a328f2d

tests are now passing for python3.

9f8f4c2

removed tox file, no reason to support python2 in this project.

lsh-0 requested review from giorgiosironi and seanwiseman November 29, 2017 05:16

changed the python3 requests cache name slightly

f97aa11

- we're going to be stuck with it from hereonout, so lets not suffix it with 'py3' reduced the number of processes to use while generating article-json so we don't flood iiif. this change should be reverted once cache is rebuilt.

tests are passing now with similar coverage

5abe9cf

Allowing 10 hours instead of 2 hours builds

85502f2

updated elife-tools in requirements.txt

b5d09b2

this should suppress the warning about the parser being used

lsh-0 merged commit 2f783a6 into feature-python3-support Dec 5, 2017

lsh-0 deleted the feature-python3-support-lsh-tweaks branch December 5, 2017 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature python3 support lsh tweaks #364

Feature python3 support lsh tweaks #364

lsh-0 commented Nov 29, 2017 •

edited

Loading

lsh-0 commented Nov 29, 2017

lsh-0 commented Nov 29, 2017 •

edited

Loading

lsh-0 commented Nov 29, 2017

seanwiseman commented Nov 29, 2017

giorgiosironi commented Nov 29, 2017

giorgiosironi commented Nov 29, 2017

giorgiosironi commented Nov 29, 2017

lsh-0 commented Nov 30, 2017

lsh-0 commented Dec 1, 2017

lsh-0 commented Dec 1, 2017

lsh-0 commented Dec 1, 2017 •

edited

Loading

lsh-0 commented Dec 1, 2017

giorgiosironi commented Dec 1, 2017

gnott commented Dec 1, 2017

lsh-0 commented Dec 3, 2017

lsh-0 commented Dec 3, 2017

lsh-0 commented Dec 4, 2017

lsh-0 commented Dec 4, 2017

giorgiosironi commented Dec 4, 2017

gnott commented Dec 4, 2017

lsh-0 commented Dec 4, 2017 •

edited

Loading

Feature python3 support lsh tweaks #364

Feature python3 support lsh tweaks #364

Conversation

lsh-0 commented Nov 29, 2017 • edited Loading

lsh-0 commented Nov 29, 2017

lsh-0 commented Nov 29, 2017 • edited Loading

lsh-0 commented Nov 29, 2017

seanwiseman commented Nov 29, 2017

giorgiosironi commented Nov 29, 2017

giorgiosironi commented Nov 29, 2017

giorgiosironi commented Nov 29, 2017

lsh-0 commented Nov 30, 2017

lsh-0 commented Dec 1, 2017

lsh-0 commented Dec 1, 2017

lsh-0 commented Dec 1, 2017 • edited Loading

lsh-0 commented Dec 1, 2017

giorgiosironi commented Dec 1, 2017

gnott commented Dec 1, 2017

lsh-0 commented Dec 3, 2017

lsh-0 commented Dec 3, 2017

lsh-0 commented Dec 4, 2017

lsh-0 commented Dec 4, 2017

giorgiosironi commented Dec 4, 2017

gnott commented Dec 4, 2017

lsh-0 commented Dec 4, 2017 • edited Loading

lsh-0 commented Nov 29, 2017 •

edited

Loading

lsh-0 commented Nov 29, 2017 •

edited

Loading

lsh-0 commented Dec 1, 2017 •

edited

Loading

lsh-0 commented Dec 4, 2017 •

edited

Loading