Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autosave triggered by single thread and not global. #56

Open
1 task done
rivermont opened this issue Oct 13, 2017 · 2 comments
Open
1 task done

Autosave triggered by single thread and not global. #56

rivermont opened this issue Oct 13, 2017 · 2 comments

Comments

@rivermont
Copy link
Owner

rivermont commented Oct 13, 2017

Checklist

  • Same issue has not been opened before.

Expected Behavior

All threads to stop as crawler prints info and saves files.

Actual Behavior

Once one thread reaches SAVE_COUNT links crawled, it saves while the other threads continue. This results in [CRAWL] logs in between [INFO] logs.

It seems like this is inefficient and could result in some saving errors.

Steps to Reproduce the Problem

  1. Run crawler
  2. Wait for the autosave cap to be hit.

Specifications

  • Crawler Version: 1.6.2
  • Platform: Ubuntu (16.04 LTS)
  • Python version: 3.5.2
  • Dependency Versions: All latest.
@rivermont rivermont changed the title Autosave triggered Autosave triggered by single thread and not global. Oct 13, 2017
@Hrily
Copy link
Collaborator

Hrily commented Oct 5, 2019

It seems like this is inefficient and could result in some saving errors.

@rivermont Can you please elaborate this?

I would like to understand all the cases where this will result in errors.

Hrily added a commit to Hrily/spidy that referenced this issue Oct 5, 2019
This commit fixes errors while autosaving by single thread. Specifically
it resolves the discrepancies in the contents saved.

Fixes rivermont#56
Hrily added a commit to Hrily/spidy that referenced this issue Oct 5, 2019
This commit fixes errors while autosaving by single thread. Specifically
it resolves the discrepancies in the contents saved.

Fixes rivermont#56
@Hrily Hrily mentioned this issue Oct 5, 2019
4 tasks
@Hrily
Copy link
Collaborator

Hrily commented Oct 5, 2019

@rivermont

I could find out few errors while auto saving and made a PR for the same.

Also, I couldn't find a way to fix logging which takes minimal change. Maybe need to revamp the logging logic so that crawling logging is paused when saving.

pbnj added a commit to pbnj/spidy that referenced this issue Oct 9, 2023
fix Travis errors as on main branch

Fix Autosave errors

This commit fixes errors while autosaving by single thread. Specifically
it resolves the discrepancies in the contents saved.

Fixes rivermont#56

docs: update docker instructions

to specify how users can pass custom config to spidy in docker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants