Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ccxt Input Abstract Plugin #80

Merged
merged 26 commits into from
Oct 7, 2022
Merged

Conversation

macanudo527
Copy link
Collaborator

@macanudo527 macanudo527 commented Sep 4, 2022

This PR addresses issue #22 .

This only has _process_trades() functioning. I commented out code that I'll edit later.

Basically, I just want to submit this to see if this is the architecture you were thinking and if everything looks right. If it looks good to you, I'll continue on with deposits, withdrawals, and then the implicit API.

I think I can implement pagebased pagination from the Coinbase code, but idbased might be difficult to do without a real-world example.

Let me know what you think.

Copy link
Owner

@eprbell eprbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. As I was reading the code I got a new design insight: I realized that pagination loop logic should probably be captured using custom iterators. It's a more natural fit and it's probably worth spending a little time exploring this approach. Custom iterators are already used extensively in the rest of the code. For a simple example of custom iterator see https://github.com/eprbell/rp2/blob/main/src/rp2/abstract_accounting_method.py.

The idea is that there is a container class (in the example it's AcquiredLotCandidates) we want to iterate on, so we define an __iter__ method which returns a custom iterator on that container (AccountingMethodIterator). The iterator class is custom defined: what makes AccountingMethodIterator a Python iterator is the __next__ method, which returns the next element (in this case it's an InTransaction). When there are no more elements to iterate on __next__ raises StopIteration.

So translating the above example to our case here:

  • AbstractPaginationDetails becomes AbstractPaginationDetailSet, DateBasedPaginationDetails becomes DateBasedPaginationDetailSet, etc. (to capture the idea that they are container classes)
  • Add "element" classes AbstractPaginationDetails, DateBasedPaginationDetails, etc. These are the elements contained in the container class and returned by the custom iterator (equivalent to InTransaction in the example above). These classes are very simple and contain only the fields necessary to do actual pagination: i.e. symbol, since, limit, parameters, etc. Perhaps these can be NamedTuples or dataclasses with frozen=true, to make sure they are read-only.
  • add def __iter__(self) -> "AbstractPaginationDetailsIterator" method to AbstractPaginationDetailSet (it's not implemented in the abstract class but only in the subclasses)
  • Add DateBasedPaginationDetailsIterator class. Its constructor receives all the information necessary to iterate: markets, current_results, etc. This class defines and updates iteration indexes
  • Add def __next__(self) -> DateBasedPaginationDetails: method to DateBasedPaginationDetailsIterator which returns the next object and updates indexes. If there are no more objects it returns StopIteration

With this approach the loop becomes something like:

pagination_detail_set: AbstractPaginationDetailSet = self.get_process_trades_pagination_detail_set(trades)
while pagination_details in pagination_detail_set:
    ...

I think this will pay off in terms of design clarity: we want to make this superclass as clean as possible because it will be used by every CCXT plugin.

I hope this all makes sense. Let me know if you have any questions or run into issues.

@macanudo527
Copy link
Collaborator Author

The iterator is going to need to know what comes back from the exchange in order to return the next "element". So, we will need to do something like this:

pagination_detail_set: AbstractPaginationDetailSet = self.get_process_trades_pagination_detail_set(trades)
while pagination_details in pagination_detail_set:
            trades = self.__client.fetch_my_trades(
                symbol=pagination_details.symbol,
                since=pagination_details.since,
                limit=pagination_details.limit,
                params=pagination_details.parameters,
            )
        
     pagination_detail_set.updateIterator(trades) # This isn't possible right?

The iterator will need to know the live data coming back from the server in order to calculate the next date range. If there are more records and we hit the limit, we need to pull the window back. We will not know that information until we start pulling the data. So, is it possible to update the iterator in the middle of the iteration?

@eprbell
Copy link
Owner

eprbell commented Sep 5, 2022

That's a really good point. To solve the problem I think we need to use the lower-level iterator API. Here's what I mean:

        pagination_detail_set: AbstractPaginationDetailSet = self.get_process_trades_pagination_detail_set()
        pagination_detail_iterator: AbstractPaginationDetailsIterator = iter(pagination_detail_set)
        try:
            while True:
                pagination_details = next(pagination_detail_iterator)
                trades = self.__client.fetch_my_trades(
                    symbol=pagination_details.symbol,
                    since=pagination_details.since,
                    limit=pagination_details.limit,
                    params=pagination_details.parameters,
                )
                pagination_detail_iterator.update_fetched_elements(trades)
        except StopIteration:
            # End of pagination details
            pass

Note that AbstractPaginationDetailSet and AbstractPaginationDetailsIterator remain as described in my previous comment, with the exception that update_fetched_elements needs to be added to AbstractPaginationDetailsIterator

Examples of this pattern in the code:

@macanudo527
Copy link
Collaborator Author

It looks like the failed checks are due to a server outage.

@eprbell
Copy link
Owner

eprbell commented Sep 5, 2022

We may want to mock test_plugin_ccxt.py to avoid these problems.

@macanudo527
Copy link
Collaborator Author

I'm wondering what to name symbol in the pagination sets/iterators. Basically, we need to cycle through markets (eg. BTC/USD) for trades but codes/symbols (eg. BTC) of crypto assets for deposits/withdrawals.

What would be a good clear name? Sector, grouping?

@eprbell
Copy link
Owner

eprbell commented Sep 6, 2022

No strong opinion: perhaps symbol or product. We can also rename it later if we think of a better name.

@macanudo527
Copy link
Collaborator Author

I have to do some cleaning, but core functionality should be available. I"m going to work on the implicit API and how to integrate that smoothly.

@eprbell
Copy link
Owner

eprbell commented Sep 7, 2022

Hm, one more thing: it looks like CCXT may not be thread-safe (see ccxt/ccxt#2252)

Two potential solutions:

  • switch from thread pool to process pool. This may be a good idea regardless, because it reduces thread safety risks in any future CCXT clients we decide to support. Doing this is trivial: instead of with ThreadPool(self.__thread_count) as pool: use with Pool(self.__thread_count) as pool: (and update the import accordingly)
  • nonce overriding approach (explained in the CCXT thread above)

Anyway, let's keep the parallel infrastructure in and keep exploring it: if we don't find any way of making CCXT work in parallel, we'll hardcode n_threads=1 (or remove the multithreading code), but I'm cautiously optimistic we can apply one of the above solutions successfully.

@macanudo527
Copy link
Collaborator Author

Thinking about the implicit API. Do you think it would be better if the abstract class uses dir() to find implicit functions and then cycles through them. Something like this:

implicit_api_list = [method for method in dir(ExchangePlugin) if method.startswith('__') is False and method.startswith('_implicit_']
for implicit_function in method_list:
 # make I/O call

Each implicit API function would also need a custom PaginationDetailSet and a post_processing() function that would sort the json into the corresponding transactions after it is retrieved from the exchange, but these could be implied from the end of the API function. Given a function _implicit_sapiGetAssetAssetDividend, the abstract plugin would look for _sapiGetAssetAssetDividendDetailSet and _sapiGetAssetAssetDividendPostProcessing functions to match.

Do you think this is too complicated? or should there simply be a _process_implicit_api() function that must be overridden in the inherited class? and users can decide how to handle things.

I guess another option would be to have a List of implicit functions that the abstract plugin can iterate over and the iterator and post-processor can be implied from that.

@eprbell
Copy link
Owner

eprbell commented Sep 8, 2022

The two approaches are not mutually exclusive: I was thinking we could do the simple approach first (_process_implicit_api), to get to MVP faster. Later maybe we can explore the other option and let the subclass decide which path to take.

@macanudo527
Copy link
Collaborator Author

Ok, I added in gains. I made it a separate function from _process_implicit_api() to keep it better sorted. Also, I hope sometime in the future CCXT will implement it as one of their unified functions. I wrote a CustomDatePaginationDetailSet to handle the processing for these, but ended up not implementing it since I don't want to mess with the very delicate subscription/redemption calculation I have. I'd like to just get to an MVP as soon as I can.

@macanudo527
Copy link
Collaborator Author

I just need to add dust trades, and then I should be able to wrap this up.

@macanudo527
Copy link
Collaborator Author

OK, this is all wrapped up. Should I add something to the docs?

@eprbell
Copy link
Owner

eprbell commented Sep 13, 2022

Great! I'll start reviewing. Yes, you can add reference to the abstract class to https://github.com/eprbell/dali-rp2/blob/main/README.dev.md#data-loader-plugin-development

Copy link
Owner

@eprbell eprbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes.

src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
@macanudo527
Copy link
Collaborator Author

macanudo527 commented Sep 28, 2022

Could you get back to me on the questions above? re: ProcessOperationResult, Trade, exchange.has[_FETCHMYTRADES], and _to_trade. And then, I think we are about finished.

@macanudo527
Copy link
Collaborator Author

macanudo527 commented Sep 29, 2022

For _process_gains() and _process_implicit_api() should these just have pass in them in subclasses if the exchange doesn't require them? or have pass in the superclass and users can override the function if necessary?

I like explicit behavior even if it's a little more work, so I would say leave it fully abstract in the superclass (no pass) and force the subclass writer to add small methods with pass.

@eprbell
Copy link
Owner

eprbell commented Sep 29, 2022

exchange.has[

Oh, sorry if I missed anything... I can't find any unresolved and unanswered questions in this PR (but Github can be a bit cumbersome with its folding of content): could you repeat the questions here?

@macanudo527
Copy link
Collaborator Author

Huh, that's strange. It lists these all unresolved for me. Here they are:

You commented about making Trade and ProcessOperationResult file-private:

This should be file-private and have a leading underscore: _ProcessOperationResult

However, that would make them unavailable in the subclass right? I have the subclass setup to return results from single transaction processes with this tuple to support multi-threading.

I also use _to_trade in the subclasses to split apart the symbol into trade info that uses Trade

Also, my comment about renaming _to_trade:

I'm not that familiar with Python naming, but basically what this function is doing is parsing 3 variables into more useful information, so I think _to makes more sense. It is performing a conversion with the inputs not retrieving anything, which is what _get implies.

My comments on .has[]:

Should the check be inside _process_deposits? I'm thinking of the case of a subclass which may be adding functionality to _process_deposits: it would call super._process_deposits and then add its custom logic. However if the check is outside, the subclass custom logic will never be called.

Same thing applies to the other checks in the lines below.

If self.__client.has[_FETCH_DEPOSITS] returns false the CCXT subclass does not have a fetchDeposits function, so the subclass would be calling an implicit API, which should be under _process_implicit_api(), right?

I guess I saw these functions just covering the basic unified API and anything else needs to be covered in _process_implicit_api() or _process_gains(). What would be the use case for overriding _process_deposits?

Also, I don't think it is a good idea to encourage these to be overwritten since it might make mult-threading more complicated. If we know these only cover the unified API it will be easier to optimize and prevent race conditions.

@eprbell
Copy link
Owner

eprbell commented Sep 30, 2022

Huh, that's strange. It lists these all unresolved for me. Here they are:

I found the comments in file view (still not in the conversation view), but your reply isn't there: did you submit your comments? In any case I'll reply here.

You commented about making Trade and ProcessOperationResult file-private:

This should be file-private and have a leading underscore: _ProcessOperationResult

However, that would make them unavailable in the subclass right? I have the subclass setup to return results from single transaction processes with this tuple to support multi-threading.

True: moot point, then.

I also use _to_trade in the subclasses to split apart the symbol into trade info that uses Trade

Also, my comment about renaming _to_trade:

I'm not that familiar with Python naming, but basically what this function is doing is parsing 3 variables into more useful information, so I think _to makes more sense. It is performing a conversion with the inputs not retrieving anything, which is what _get implies.

That's fine: it was just a suggestion and it's a matter of personal preference. Let's keep _to.

My comments on .has[]:

Should the check be inside _process_deposits? I'm thinking of the case of a subclass which may be adding functionality to _process_deposits: it would call super._process_deposits and then add its custom logic. However if the check is outside, the subclass custom logic will never be called.

Same thing applies to the other checks in the lines below.

If self.__client.has[_FETCH_DEPOSITS] returns false the CCXT subclass does not have a fetchDeposits function, so the subclass would be calling an implicit API, which should be under _process_implicit_api(), right?

Another good point. Ok, let's leave it as is, then.

I guess I saw these functions just covering the basic unified API and anything else needs to be covered in _process_implicit_api() or _process_gains(). What would be the use case for overriding _process_deposits?

Yeah, I was thinking the user might add implicit "deposit" APIs to their own version of _process_deposit(), but I think you're right: they would just add them to _process_implicit_api(), which is a simpler model.

Also, I don't think it is a good idea to encourage these to be overwritten since it might make mult-threading more complicated. If we know these only cover the unified API it will be easier to optimize and prevent race conditions.

So it sounds like we can ignore all of these comments and move on. I'll review everything once more and then we can merge.

Copy link
Owner

@eprbell eprbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor fixes needed.

src/dali/abstract_ccxt_input_plugin.py Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
try:
while True:
pagination_details: PaginationDetails = next(pagination_detail_iterator)
trades: Optional[List[Dict[str, Union[str, float]]]] = self._client.fetch_my_trades(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on this?

src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
Comment on lines 95 to 96
self._client: Exchange = self._initialize_client()
self._thread_count = thread_count if thread_count else self.__DEFAULT_THREAD_COUNT
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fields still need to start with double underscore (our data models are encapsulated and we never access fields directly outside the class in which they are defined). To implement protected behavior they need a property that starts with single underscore (to access in the subclass). Sorry if my description wasn't clear when we discussed this.

Here's an updated text that's hopefully clearer:

  • all class fields have two leading underscores because our data models are encapsulated and we never access fields directly outside the class in which they are defined
  • private fields: visible only to the class, nothing else to do (access within the class via the double-underscore filed)
  • protected fields: visible to the class and to its subclasses, needs to be encapsulated with a property or an accessor function that starts with one leading underscore
  • public fields: visible to everybody, needs to be encapsulated with a property or acccessor function that starts with no leading underscore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately we keep going around and around on this. I guess I understand the basic concept, but how does this look when implemented? So for client and logger, I should have double underscore:

self.__client
self.__thread_count

And then, property accessors:

@property
def client(self):
    return self.__client

@property
def thread_count(self):
    return self.__thread_count

Or do I use _get accessors?

(These fields are used in the subclasses.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, actual fields have leading double underscore:

self.__client
self.__thread_count

Then:

  • if we want private access (this class only) we do nothing else
  • if we want protected access (this class and its subclasses), we add a property with a single underscore (or an accessor method starting with _get)
  • if we want public access (everybody), we add a property with no underscore (or an accessor method starting with get)

Whether to use a property or a get function is up to you: I tend to use properties if the access logic is fairly simple and get functions if access logic is more complicated. But again, either way works. Hope this is clearer, sorry if I didn't articulate it clearly enough earlier.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a data encapsulation paragraph (with the above description) to the Design Guidelines section of README.dev.md both in RP2 and DaLI. Let me know if you think it needs more explanation: I'd like to fix the docs so that nobody trips on it in the future.

@macanudo527
Copy link
Collaborator Author

I just want to double check that you can see my most recent comments on __client and __thread_count. As well as the one about floats in CCXT.

@eprbell
Copy link
Owner

eprbell commented Oct 4, 2022

Unfortunately I cannot :-(

When you look at your comments do they show up as "pending"? If so you need to submit them to make them publicly visible. Not sure what else this could be, short of a site malfunction...

Copy link
Collaborator Author

@macanudo527 macanudo527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now, you should see all comments, right?

src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
try:
while True:
pagination_details: PaginationDetails = next(pagination_detail_iterator)
trades: Optional[List[Dict[str, Union[str, float]]]] = self._client.fetch_my_trades(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately.

src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
Comment on lines 95 to 96
self._client: Exchange = self._initialize_client()
self._thread_count = thread_count if thread_count else self.__DEFAULT_THREAD_COUNT
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately we keep going around and around on this. I guess I understand the basic concept, but how does this look when implemented? So for client and logger, I should have double underscore:

self.__client
self.__thread_count

And then, property accessors:

@property
def client(self):
    return self.__client

@property
def thread_count(self):
    return self.__thread_count

Or do I use _get accessors?

(These fields are used in the subclasses.)

try:
while True:
pagination_details: PaginationDetails = next(pagination_detail_iterator)
trades: Optional[List[Dict[str, Union[str, float]]]] = self._client.fetch_my_trades(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately. CCXT outputs floats for some reason. It also accepts floats for its other functions for amounts.

Huh, so you don't see my other comment on this? That's so bizarre, it's like Github made them all disappear for you.

src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
@macanudo527
Copy link
Collaborator Author

Ok, now, everything should show. I have no idea how I have used Github for so long and not realized you have to submit all your comments. I have no idea what I was doing before.

Sorry about that.

@eprbell
Copy link
Owner

eprbell commented Oct 4, 2022

Now I see it: mystery solved :-) Will take a look in the next 2-3 days (currently traveling).

Copy link
Owner

@eprbell eprbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I replied to all of the comments. Let me know if anything is still missing.

src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
Copy link
Owner

@eprbell eprbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More answers.

Comment on lines 95 to 96
self._client: Exchange = self._initialize_client()
self._thread_count = thread_count if thread_count else self.__DEFAULT_THREAD_COUNT
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, actual fields have leading double underscore:

self.__client
self.__thread_count

Then:

  • if we want private access (this class only) we do nothing else
  • if we want protected access (this class and its subclasses), we add a property with a single underscore (or an accessor method starting with _get)
  • if we want public access (everybody), we add a property with no underscore (or an accessor method starting with get)

Whether to use a property or a get function is up to you: I tend to use properties if the access logic is fairly simple and get functions if access logic is more complicated. But again, either way works. Hope this is clearer, sorry if I didn't articulate it clearly enough earlier.

try:
while True:
pagination_details: PaginationDetails = next(pagination_detail_iterator)
trades: Optional[List[Dict[str, Union[str, float]]]] = self._client.fetch_my_trades(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good.

@macanudo527
Copy link
Collaborator Author

I think I more or less wrapped everything up. I overrode the _client property to type it for mypy. Let me know if that is the right way to do it.

Copy link
Owner

@eprbell eprbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typos. I'll do one more full review once those are fixed and then we'll merge.

src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/abstract_ccxt_input_plugin.py Outdated Show resolved Hide resolved
src/dali/plugin/input/rest/binance_com.py Show resolved Hide resolved
Copy link
Owner

@eprbell eprbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@eprbell
Copy link
Owner

eprbell commented Oct 7, 2022

Grats! This is an important merge.

@eprbell eprbell merged commit f4d4def into eprbell:main Oct 7, 2022
@macanudo527
Copy link
Collaborator Author

It's good to see this go through! Thanks for all the help on the differences with protected / private.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants