-
Notifications
You must be signed in to change notification settings - Fork 2
♻️ Refactor FastCrawler core codebase and synced to FastAPI backend client #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| from .process import Process | ||
|
|
||
|
|
||
| def list_process(crawlers: list[Process] | Process) -> list[Process]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad naming
bad design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be fixed in next PR.
|
|
||
| @property | ||
| def get_all_serves(self) -> list[Callable]: | ||
| def get_all_serves(self) -> list[Coroutine[Any, Any, None]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use Any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type I should use then?
I have to look up in uvicorn to check the protocol, then define the servable coroutine.
| self.task = Task( | ||
| start_cond=cond or "every 1 second", | ||
| name=spider.__class__.__name__ + str(uuid4()), | ||
| name=f"{uuid4()}@{spider.__class__.__name__}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs lots of comments and examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add doc src in future.
|
|
||
| @property | ||
| def is_stopped(self) -> set: | ||
| def instances(self) -> list[Self | "Spider"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong encapsulation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, will change that in next PR
| self._instances = [self] | ||
| return self._instances | ||
|
|
||
| @instances.setter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? the instances?
any suggestion?
Like linked_spiders?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spiders maybe? i don't know you decide
| if response | ||
| ] | ||
| await self.save(results) | ||
| for idx in range(0, len(urls), self.get_batch_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too nested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, I've wrote to you in past that I don't like this part of code.
All would goo to Batch class.
I haven't prototyped it yet.
Currently vahid would be working on core, with that new feature that we discussed yesterday.
Then i'll kick off implementing the Batch class.
| ] | ||
| ) | ||
| return {url: result for url, result in zip(tasks.keys(), results)} | ||
| tasks = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use list() instead of []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not correct. using language construct is always better.
Linters prefer language construct
and also it's better in term of perfomance
| port: int | ||
| username: str | None = None | ||
| password: str | None = None | ||
| protocol: str = "http://" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be an enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be done in another pr.
btw, what are the possibilities? I should research.
|
|
||
|
|
||
| @dataclass | ||
| class RequestCycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request and Response are easy conecepts
but anything Cycle related needs lots of doc strings and comments and examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do. in another PR
| urls_1 = [Request(url=url) for url in (["http://127.0.0.1:8000/throttled/3/"] * 2)] | ||
| urls_2 = [Request(url=url) for url in (["http://127.0.0.1:8000/throttled/5/"] * 1)] | ||
| urls = [ | ||
| Request(url="http://127.0.0.1:8000/throttled", data={"seconds": 0.01}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need a variable for this url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be fix in next PR
No description provided.