You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think this should be easy to fix with a more sophisticated splitting regex, rather than splitting on ),(, we can split on \'\),\(([1-9][0-9]*,[0-9]+,\')
While this isn't absolutely foolproof (i.e. someone could make an adversarial redirect that would screw us), it's quite unlikely that this will happen.
),( is already unlikely, now '),(1,0,' is much more unlkely, as it requires an extra:
' before
number comma number comma ' after
I'm rewriting the pipeline in snakemake to make it easier to debug and run in parallel.
I'm trying to import the dump from enwiki-20221001
This ends up creating this line (which has the wrong title, and also has only 2 columns instead of three) in pages.txt.gz:
Here's some context for surrounding lines:
I will do some more research on this shortly.
The text was updated successfully, but these errors were encountered: