Skip to content

Conversation

@ManiMozaffar
Copy link
Member

No description provided.

@ManiMozaffar ManiMozaffar changed the base branch from main to develop July 27, 2023 17:00
@ManiMozaffar ManiMozaffar linked an issue Jul 29, 2023 that may be closed by this pull request
5 tasks
@ManiMozaffar ManiMozaffar marked this pull request as ready for review August 19, 2023 13:29
@ManiMozaffar ManiMozaffar changed the title Fix bug in core Improve FastCrawler core and synced to FastAPI backend client Aug 19, 2023
@ManiMozaffar ManiMozaffar changed the title Improve FastCrawler core and synced to FastAPI backend client Refactor FastCrawler core codebase and synced to FastAPI backend client Aug 19, 2023
@ManiMozaffar ManiMozaffar changed the title Refactor FastCrawler core codebase and synced to FastAPI backend client ♻️ Refactor FastCrawler core codebase and synced to FastAPI backend client Aug 19, 2023
@ManiMozaffar ManiMozaffar merged commit 374ae77 into develop Aug 19, 2023
from .process import Process


def list_process(crawlers: list[Process] | Process) -> list[Process]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad naming
bad design

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed in next PR.


@property
def get_all_serves(self) -> list[Callable]:
def get_all_serves(self) -> list[Coroutine[Any, Any, None]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use Any

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type I should use then?
I have to look up in uvicorn to check the protocol, then define the servable coroutine.

self.task = Task(
start_cond=cond or "every 1 second",
name=spider.__class__.__name__ + str(uuid4()),
name=f"{uuid4()}@{spider.__class__.__name__}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs lots of comments and examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add doc src in future.


@property
def is_stopped(self) -> set:
def instances(self) -> list[Self | "Spider"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong encapsulation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, will change that in next PR

self._instances = [self]
return self._instances

@instances.setter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad naming

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? the instances?
any suggestion?
Like linked_spiders?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spiders maybe? i don't know you decide

if response
]
await self.save(results)
for idx in range(0, len(urls), self.get_batch_size):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too nested

Copy link
Member Author

@ManiMozaffar ManiMozaffar Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, I've wrote to you in past that I don't like this part of code.
All would goo to Batch class.
I haven't prototyped it yet.
Currently vahid would be working on core, with that new feature that we discussed yesterday.
Then i'll kick off implementing the Batch class.

]
)
return {url: result for url, result in zip(tasks.keys(), results)}
tasks = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use list() instead of []

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not correct. using language construct is always better.
Linters prefer language construct
and also it's better in term of perfomance

port: int
username: str | None = None
password: str | None = None
protocol: str = "http://"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be an enum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be done in another pr.
btw, what are the possibilities? I should research.



@dataclass
class RequestCycle:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request and Response are easy conecepts
but anything Cycle related needs lots of doc strings and comments and examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do. in another PR

urls_1 = [Request(url=url) for url in (["http://127.0.0.1:8000/throttled/3/"] * 2)]
urls_2 = [Request(url=url) for url in (["http://127.0.0.1:8000/throttled/5/"] * 1)]
urls = [
Request(url="http://127.0.0.1:8000/throttled", data={"seconds": 0.01}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need a variable for this url

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fix in next PR

@amirbahador-hub amirbahador-hub deleted the bug/core branch February 15, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Required features for core are missing

4 participants