New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wget.sh generated but nothing follows #6
Comments
I will. Basically, it's about adding some
What's OS you're running? Do you have any output from the command I've tried to run (exactly as you did, except I don't need I suggest you to remove the temporary directory (the Hope this helps Result on my machine
|
I'm running Mac OS X 10.9.5, and here's the requested output:
Anything weird in that output? I have tried refreshing the |
More pointers to my config:
(I had to install wget through homebrew.) |
Okay, just ran your script with Xubuntu, and it works fine. Last question: what do I need to set to scrape old messages? The default settings seem to have scraped only a tiny fraction of the emails, and I supposed that those that got scraped are the most recent ones. Thanks again for your help! |
Perfect. I don't have a Mac for test; I would ask some guy to improve the script
By default, For example, I can fetch a 4-year archive of my group (http://l.archlinuxvn.org/archlinuxvn/). After I fetch all mesages, I need to run |
Hmm, I have run the following commands, and my
Similarly, I get only one file in Sorry if my questions are very basic. I'm struggling to understand how this all works. |
Let me check. There may be something wrong with the script! |
I've fixed the regular expression issue in the last two commits. Please try to run Thanks a lot! |
The scraper has been running for some time now, everything seems to be alright with Thanks a lot! |
Ah my bad, it's not Thanks again for your patience. I would reopen this ticket because there was a problem with Mac support. |
As far as I can tell, it's not your fault: it must have to do with the versions of Also note that I am using Mac OS X 10.9.5, which is quite old by now (current OSX is 11.0). What versions of |
I understand. My versions are |
@Gnouc I thought it's due to a
As I recall that won't work on |
@icy: If you used GNU tools, you're fine with that.
|
Confirm working for FreeBSD 10.2 as well, GNU's wget from FreshPorts. |
Me too :-). |
@icy : just have a quick test on OSX 10.10.5 |
@luk4hn What's the point of If you worry about the newline character in OSX, then insert it literal:
or using |
@Gnouc : Hehe I just tried to point out the problem. |
@luk4hn Can you please send a pull request? |
Fix #6: pass \n to sed as ANSI-C quoting BSD version of sed wont interpret '\n' as newline character. Passing '\n' to sed as ANSI-C quoting would avoid this problem.
Hi,
Would you mind adding some notes on how to troubleshoot the script?
I'm trying to download this list with the following parameters:
The next commands then generate the
wget.sh
file and try to run it, but the file itself does not seem to run on anything:./crawler.sh -sh > wget.sh bash wget.sh
Thanks in advance for any pointers. The
wget.sh
file I get is copied below.The text was updated successfully, but these errors were encountered: