Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you add artists to scrape or access the web interface #64

Open
catlover1019 opened this issue Oct 20, 2019 · 7 comments
Open

How do you add artists to scrape or access the web interface #64

catlover1019 opened this issue Oct 20, 2019 · 7 comments

Comments

@catlover1019
Copy link

Hi. I recently downloaded this, and have tried to use it. I'm wondering how one would actually ad artists to scrape, or use the web interface. I know these are very basic questions, but I'm new to all of this.

@fake-name
Copy link
Owner

The README.md is a bit terse, but has all the important bits in it.

Configuration is done via a file named settings.py which must be placed in the
repository root. settings.base.py is an example config to work from.
In general, you will probably want to copy settings.base.py to settings.py, and then
add your various usernames/password/database-config.

settings.py is also where the login information for the various plugins goes.

So copy settings.base.py, rename it settings.py. Set up the accounts you care about in there.

DB Backend is selected via the USE_POSTGRESQL parameter in settings.py.

Unless you have a postgres instance, set USE_POSTGRESQL to False.

If using postgre, DB setup is left to the user. xA-Scraper requires it's own database,
and the ability to make IP-based connections to the hosting PG instance. The connection
information, DB name, and client name must be set in settings.py.

When using sqlite, you just have to specify the path to where you want the sqlite db to
be located (or you can use the default, which is ./sqlite_db.db).

The default path in settings.base.py is probably fine, unless you want to put it somewhere else.

Disabling of select plugins can be accomplished by commenting out the appropriate
line in main.py. The JOBS list dictates the various scheduled scraper tasks
that are placed into the scheduling system.

You can now also comment out the config section for plugins you don't want active as well. I should update the readme.

The preferred bootstrap method is to use run.sh from the repository root. It will
ensure the required packages are available (build-essential, libxml2 libxslt1-dev
python3-dev libz-dev), and then install all the required python modules in a local
virtualenv. Additonally, it checks if the virtualenv is present, so once it's created,
./run.sh will just source the venv, and run the scraper witout any reinstallation.

run.sh currently works on Ubuntu (and maybe WSL. There's another user who apparently runs the project under WSL, but I don't do any testing there). I basically just use Ubuntu 18.04 LTS server for everything, but the desktop distro should be fine too.

run.sh is blocking, and when it's active you can then access the web interface. The startup process should print out the server port and IP (you can set them in the settings). It's generally http://<server_system_ip>:6543 unless you've changed it.

I've you've never done anything linux before, you're probably going to have a hard time.

@Copy-link
Copy link
Contributor

Copy-link commented Oct 20, 2019

There's no need to use WSL for running this on Windows. As long as you're willing to put up with command prompts (I use RBTray to minimize them to the systray) there's really no reason to go digging up an old machine for running a linux server on.

The process is simple. After installing Python to your computer, you navigate to the xA-Scraper directory and run pip install -r requirements.txt and that will download all the dependencies. You do the settings thing @fake-name described above, and then you run the command python db_migrate.py db upgrade && python main.py to launch the web interface.

Access the web interface through the URL provided and add the artists you want to follow. What @fake-name does is he leaves this running and it automatically downloads new stuff in the background every so often. I have no idea if this actually works on Windows because I've never tried it. Instead I just close the web interface and the command prompt that was housing it and then I run python -m manage reset-run-state && python -m manage fetch xx (except replace xx with the initials for the site you want to scrape, e.g. fa, wy, ng...).

I make batch scripts for all of this to expedite the entire process, including one that will scrape all the sites in order (except I make sure to put fa last since that one is by far the slowest).

I run a PostgreSQL instance on Windows as well (also not using WSL), since it is much faster than SQLite, but that's a bit more complicated to set up. I've been thinking of adding a script to the repo (if @fake-name would allow it) that would automate the setup of PostgreSQL. I would need a more elegant way of putting it into the system tray first, however.

@fake-name
Copy link
Owner

fake-name commented Oct 20, 2019

there's really no reason to go digging up an old machine for running a linux server on.

VMs are a thing ;)

I run a PostgreSQL instance on Windows as well (also not using WSL), since it is much faster than SQLite.

To be fair, sqlite has perf issues only for the web view, and only when cold.

Basically, when you first access the web view after a long period of time (many minutes), the database will not be cached, so sqlite has to read it from disk, which can take as much as 60 seconds in my experience. Once the db contents are in the filesystem cache, it worked fine.

On the other hand, it's barely tested, so there may be issues I don't know about.

Personally, I have a really, really big postgres (128GB RAM, 5+ TB of tables) instance that's shared across all my projects, so I basically just use that by default for everything.

@Copy-link
Copy link
Contributor

Copy-link commented Oct 20, 2019

VMs are a thing ;)

Hyper-V's Linux support is horrid (unless you use Microsoft's pre-packaged Ubuntu Desktop 18.04 LTS image and quite frankly I can't stand the Unity DE) and toggling Hyper-V off to use something like VirtualBox is a pain in the ass. And unfortunately I currently need Hyper-V to remain enabled.

When 20H1 is out this thankfully won't be an issue, with Hyper-V no longer being a big compatibility obstacle as well as WSL2 providing 99% compatibility with Linux CLI packages.

@fake-name
Copy link
Owner

fake-name commented Oct 20, 2019

I ran a pile of linux VMs under Hyper V in ~2012-2014 or so and it worked fine.

If you're on top of hyper-v, I can see using virtualbox or something being problematic.

Unity DE

They dropped that for 18.04? (the last distro that ran unity was 17.04) Gnome 3 is also pretty goddamn bad, but switching to xfce isn't hard.

@Copy-link
Copy link
Contributor

Copy-link commented Oct 20, 2019

I was not very successful in getting it to mount my local drives so that data could be properly exchanged. Clipboard sharing was also broken, and good god is it a pain to manually type out commands from my notes instead of just copying and pasting.

They dropped that for 18.04?

They turned it into a GNOME skin essentially. As for xfce, trying to switch to it broke the clipboard sharing, so I gave up on it.

When 20H1 is just around the bend it doesn't seem like worth putting energy into, especially when native Windows doesn't really have any current issues with your project aside from the multiprocessing (which you've stated is in dire need of a rework anyway).

@catlover1019
Copy link
Author

I use Linux Mint as my main OS, which at it's core is just the same Ubuntu 18.04 LTS that this is developed on, so I'm good there.

I'll try it again and get back to you. I do think that the one piece of information that the readme's missing is that management such as adding artists is done through the web interface. It doesn't explicitly say so, and I for some reason didn't assume so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants