Ccxt Input Abstract Plugin #80

macanudo527 · 2022-09-04T11:12:55Z

This PR addresses issue #22 .

This only has _process_trades() functioning. I commented out code that I'll edit later.

Basically, I just want to submit this to see if this is the architecture you were thinking and if everything looks right. If it looks good to you, I'll continue on with deposits, withdrawals, and then the implicit API.

I think I can implement pagebased pagination from the Coinbase code, but idbased might be difficult to do without a real-world example.

Let me know what you think.

eprbell

This is cool. As I was reading the code I got a new design insight: I realized that pagination loop logic should probably be captured using custom iterators. It's a more natural fit and it's probably worth spending a little time exploring this approach. Custom iterators are already used extensively in the rest of the code. For a simple example of custom iterator see https://github.com/eprbell/rp2/blob/main/src/rp2/abstract_accounting_method.py.

The idea is that there is a container class (in the example it's AcquiredLotCandidates) we want to iterate on, so we define an __iter__ method which returns a custom iterator on that container (AccountingMethodIterator). The iterator class is custom defined: what makes AccountingMethodIterator a Python iterator is the __next__ method, which returns the next element (in this case it's an InTransaction). When there are no more elements to iterate on __next__ raises StopIteration.

So translating the above example to our case here:

AbstractPaginationDetails becomes AbstractPaginationDetailSet, DateBasedPaginationDetails becomes DateBasedPaginationDetailSet, etc. (to capture the idea that they are container classes)
Add "element" classes AbstractPaginationDetails, DateBasedPaginationDetails, etc. These are the elements contained in the container class and returned by the custom iterator (equivalent to InTransaction in the example above). These classes are very simple and contain only the fields necessary to do actual pagination: i.e. symbol, since, limit, parameters, etc. Perhaps these can be NamedTuples or dataclasses with frozen=true, to make sure they are read-only.
add def __iter__(self) -> "AbstractPaginationDetailsIterator" method to AbstractPaginationDetailSet (it's not implemented in the abstract class but only in the subclasses)
Add DateBasedPaginationDetailsIterator class. Its constructor receives all the information necessary to iterate: markets, current_results, etc. This class defines and updates iteration indexes
Add def __next__(self) -> DateBasedPaginationDetails: method to DateBasedPaginationDetailsIterator which returns the next object and updates indexes. If there are no more objects it returns StopIteration

With this approach the loop becomes something like:

pagination_detail_set: AbstractPaginationDetailSet = self.get_process_trades_pagination_detail_set(trades)
while pagination_details in pagination_detail_set:
    ...

I think this will pay off in terms of design clarity: we want to make this superclass as clean as possible because it will be used by every CCXT plugin.

I hope this all makes sense. Let me know if you have any questions or run into issues.

macanudo527 · 2022-09-05T02:25:12Z

The iterator is going to need to know what comes back from the exchange in order to return the next "element". So, we will need to do something like this:

pagination_detail_set: AbstractPaginationDetailSet = self.get_process_trades_pagination_detail_set(trades)
while pagination_details in pagination_detail_set:
            trades = self.__client.fetch_my_trades(
                symbol=pagination_details.symbol,
                since=pagination_details.since,
                limit=pagination_details.limit,
                params=pagination_details.parameters,
            )
        
     pagination_detail_set.updateIterator(trades) # This isn't possible right?

The iterator will need to know the live data coming back from the server in order to calculate the next date range. If there are more records and we hit the limit, we need to pull the window back. We will not know that information until we start pulling the data. So, is it possible to update the iterator in the middle of the iteration?

eprbell · 2022-09-05T02:57:20Z

That's a really good point. To solve the problem I think we need to use the lower-level iterator API. Here's what I mean:

        pagination_detail_set: AbstractPaginationDetailSet = self.get_process_trades_pagination_detail_set()
        pagination_detail_iterator: AbstractPaginationDetailsIterator = iter(pagination_detail_set)
        try:
            while True:
                pagination_details = next(pagination_detail_iterator)
                trades = self.__client.fetch_my_trades(
                    symbol=pagination_details.symbol,
                    since=pagination_details.since,
                    limit=pagination_details.limit,
                    params=pagination_details.parameters,
                )
                pagination_detail_iterator.update_fetched_elements(trades)
        except StopIteration:
            # End of pagination details
            pass

Note that AbstractPaginationDetailSet and AbstractPaginationDetailsIterator remain as described in my previous comment, with the exception that update_fetched_elements needs to be added to AbstractPaginationDetailsIterator

Examples of this pattern in the code:

example of use of iter to obtain an iterator object: https://github.com/eprbell/rp2/blob/main/src/rp2/tax_engine.py#L97
example of use of next on the iterator and catching StopIteration: https://github.com/eprbell/rp2/blob/main/src/rp2/accounting_engine.py#L102

macanudo527 · 2022-09-05T13:44:27Z

It looks like the failed checks are due to a server outage.

eprbell · 2022-09-05T16:04:24Z

We may want to mock test_plugin_ccxt.py to avoid these problems.

macanudo527 · 2022-09-06T02:35:24Z

I'm wondering what to name symbol in the pagination sets/iterators. Basically, we need to cycle through markets (eg. BTC/USD) for trades but codes/symbols (eg. BTC) of crypto assets for deposits/withdrawals.

What would be a good clear name? Sector, grouping?

eprbell · 2022-09-06T09:24:03Z

No strong opinion: perhaps symbol or product. We can also rename it later if we think of a better name.

macanudo527 · 2022-09-07T07:43:14Z

I have to do some cleaning, but core functionality should be available. I"m going to work on the implicit API and how to integrate that smoothly.

eprbell · 2022-09-07T13:33:56Z

Hm, one more thing: it looks like CCXT may not be thread-safe (see ccxt/ccxt#2252)

Two potential solutions:

switch from thread pool to process pool. This may be a good idea regardless, because it reduces thread safety risks in any future CCXT clients we decide to support. Doing this is trivial: instead of with ThreadPool(self.__thread_count) as pool: use with Pool(self.__thread_count) as pool: (and update the import accordingly)
nonce overriding approach (explained in the CCXT thread above)

Anyway, let's keep the parallel infrastructure in and keep exploring it: if we don't find any way of making CCXT work in parallel, we'll hardcode n_threads=1 (or remove the multithreading code), but I'm cautiously optimistic we can apply one of the above solutions successfully.

src/dali/abstract_ccxt_input_plugin.py

macanudo527 · 2022-09-08T00:24:04Z

Thinking about the implicit API. Do you think it would be better if the abstract class uses dir() to find implicit functions and then cycles through them. Something like this:

implicit_api_list = [method for method in dir(ExchangePlugin) if method.startswith('__') is False and method.startswith('_implicit_']
for implicit_function in method_list:
 # make I/O call

Each implicit API function would also need a custom PaginationDetailSet and a post_processing() function that would sort the json into the corresponding transactions after it is retrieved from the exchange, but these could be implied from the end of the API function. Given a function _implicit_sapiGetAssetAssetDividend, the abstract plugin would look for _sapiGetAssetAssetDividendDetailSet and _sapiGetAssetAssetDividendPostProcessing functions to match.

Do you think this is too complicated? or should there simply be a _process_implicit_api() function that must be overridden in the inherited class? and users can decide how to handle things.

I guess another option would be to have a List of implicit functions that the abstract plugin can iterate over and the iterator and post-processor can be implied from that.

eprbell · 2022-09-08T07:12:44Z

The two approaches are not mutually exclusive: I was thinking we could do the simple approach first (_process_implicit_api), to get to MVP faster. Later maybe we can explore the other option and let the subclass decide which path to take.

macanudo527 · 2022-09-11T12:55:05Z

Ok, I added in gains. I made it a separate function from _process_implicit_api() to keep it better sorted. Also, I hope sometime in the future CCXT will implement it as one of their unified functions. I wrote a CustomDatePaginationDetailSet to handle the processing for these, but ended up not implementing it since I don't want to mess with the very delicate subscription/redemption calculation I have. I'd like to just get to an MVP as soon as I can.

macanudo527 · 2022-09-12T14:26:32Z

I just need to add dust trades, and then I should be able to wrap this up.

macanudo527 · 2022-09-13T01:58:16Z

OK, this is all wrapped up. Should I add something to the docs?

eprbell · 2022-09-13T05:32:28Z

Great! I'll start reviewing. Yes, you can add reference to the abstract class to https://github.com/eprbell/dali-rp2/blob/main/README.dev.md#data-loader-plugin-development

eprbell

Minor changes.

src/dali/abstract_ccxt_input_plugin.py

macanudo527 · 2022-09-28T23:38:39Z

Could you get back to me on the questions above? re: ProcessOperationResult, Trade, exchange.has[_FETCHMYTRADES], and _to_trade. And then, I think we are about finished.

macanudo527 · 2022-09-29T00:08:00Z

For _process_gains() and _process_implicit_api() should these just have pass in them in subclasses if the exchange doesn't require them? or have pass in the superclass and users can override the function if necessary?

I like explicit behavior even if it's a little more work, so I would say leave it fully abstract in the superclass (no pass) and force the subclass writer to add small methods with pass.

eprbell · 2022-09-29T05:59:42Z

exchange.has[

Oh, sorry if I missed anything... I can't find any unresolved and unanswered questions in this PR (but Github can be a bit cumbersome with its folding of content): could you repeat the questions here?

macanudo527 · 2022-09-29T14:10:02Z

Huh, that's strange. It lists these all unresolved for me. Here they are:

You commented about making Trade and ProcessOperationResult file-private:

This should be file-private and have a leading underscore: _ProcessOperationResult

However, that would make them unavailable in the subclass right? I have the subclass setup to return results from single transaction processes with this tuple to support multi-threading.

I also use _to_trade in the subclasses to split apart the symbol into trade info that uses Trade

Also, my comment about renaming _to_trade:

I'm not that familiar with Python naming, but basically what this function is doing is parsing 3 variables into more useful information, so I think _to makes more sense. It is performing a conversion with the inputs not retrieving anything, which is what _get implies.

My comments on .has[]:

Should the check be inside _process_deposits? I'm thinking of the case of a subclass which may be adding functionality to _process_deposits: it would call super._process_deposits and then add its custom logic. However if the check is outside, the subclass custom logic will never be called.

Same thing applies to the other checks in the lines below.

If self.__client.has[_FETCH_DEPOSITS] returns false the CCXT subclass does not have a fetchDeposits function, so the subclass would be calling an implicit API, which should be under _process_implicit_api(), right?

I guess I saw these functions just covering the basic unified API and anything else needs to be covered in _process_implicit_api() or _process_gains(). What would be the use case for overriding _process_deposits?

Also, I don't think it is a good idea to encourage these to be overwritten since it might make mult-threading more complicated. If we know these only cover the unified API it will be easier to optimize and prevent race conditions.

eprbell · 2022-09-30T08:04:16Z

Huh, that's strange. It lists these all unresolved for me. Here they are:

I found the comments in file view (still not in the conversation view), but your reply isn't there: did you submit your comments? In any case I'll reply here.

You commented about making Trade and ProcessOperationResult file-private:

This should be file-private and have a leading underscore: _ProcessOperationResult

However, that would make them unavailable in the subclass right? I have the subclass setup to return results from single transaction processes with this tuple to support multi-threading.

True: moot point, then.

I also use _to_trade in the subclasses to split apart the symbol into trade info that uses Trade

Also, my comment about renaming _to_trade:

I'm not that familiar with Python naming, but basically what this function is doing is parsing 3 variables into more useful information, so I think _to makes more sense. It is performing a conversion with the inputs not retrieving anything, which is what _get implies.

That's fine: it was just a suggestion and it's a matter of personal preference. Let's keep _to.

My comments on .has[]:

Should the check be inside _process_deposits? I'm thinking of the case of a subclass which may be adding functionality to _process_deposits: it would call super._process_deposits and then add its custom logic. However if the check is outside, the subclass custom logic will never be called.

Same thing applies to the other checks in the lines below.

If self.__client.has[_FETCH_DEPOSITS] returns false the CCXT subclass does not have a fetchDeposits function, so the subclass would be calling an implicit API, which should be under _process_implicit_api(), right?

Another good point. Ok, let's leave it as is, then.

I guess I saw these functions just covering the basic unified API and anything else needs to be covered in _process_implicit_api() or _process_gains(). What would be the use case for overriding _process_deposits?

Yeah, I was thinking the user might add implicit "deposit" APIs to their own version of _process_deposit(), but I think you're right: they would just add them to _process_implicit_api(), which is a simpler model.

Also, I don't think it is a good idea to encourage these to be overwritten since it might make mult-threading more complicated. If we know these only cover the unified API it will be easier to optimize and prevent race conditions.

So it sounds like we can ignore all of these comments and move on. I'll review everything once more and then we can merge.

eprbell

A few minor fixes needed.

src/dali/abstract_ccxt_input_plugin.py

eprbell · 2022-10-02T04:17:04Z

src/dali/abstract_ccxt_input_plugin.py

+        try:
+            while True:
+                pagination_details: PaginationDetails = next(pagination_detail_iterator)
+                trades: Optional[List[Dict[str, Union[str, float]]]] = self._client.fetch_my_trades(


Any thoughts on this?

src/dali/abstract_ccxt_input_plugin.py

eprbell · 2022-10-02T04:43:11Z

src/dali/abstract_ccxt_input_plugin.py

+        self._client: Exchange = self._initialize_client()
+        self._thread_count = thread_count if thread_count else self.__DEFAULT_THREAD_COUNT


All fields still need to start with double underscore (our data models are encapsulated and we never access fields directly outside the class in which they are defined). To implement protected behavior they need a property that starts with single underscore (to access in the subclass). Sorry if my description wasn't clear when we discussed this.

Here's an updated text that's hopefully clearer:

all class fields have two leading underscores because our data models are encapsulated and we never access fields directly outside the class in which they are defined

private fields: visible only to the class, nothing else to do (access within the class via the double-underscore filed)

protected fields: visible to the class and to its subclasses, needs to be encapsulated with a property or an accessor function that starts with one leading underscore

public fields: visible to everybody, needs to be encapsulated with a property or acccessor function that starts with no leading underscore.

Yeah, unfortunately we keep going around and around on this. I guess I understand the basic concept, but how does this look when implemented? So for client and logger, I should have double underscore:

self.__client self.__thread_count

And then, property accessors:

@property def client(self): return self.__client @property def thread_count(self): return self.__thread_count

Or do I use _get accessors?

(These fields are used in the subclasses.)

So, actual fields have leading double underscore:

self.__client self.__thread_count

Then:

if we want private access (this class only) we do nothing else

if we want protected access (this class and its subclasses), we add a property with a single underscore (or an accessor method starting with _get)

if we want public access (everybody), we add a property with no underscore (or an accessor method starting with get)

Whether to use a property or a get function is up to you: I tend to use properties if the access logic is fairly simple and get functions if access logic is more complicated. But again, either way works. Hope this is clearer, sorry if I didn't articulate it clearly enough earlier.

I added a data encapsulation paragraph (with the above description) to the Design Guidelines section of README.dev.md both in RP2 and DaLI. Let me know if you think it needs more explanation: I'd like to fix the docs so that nobody trips on it in the future.

macanudo527 · 2022-10-03T13:33:42Z

I just want to double check that you can see my most recent comments on __client and __thread_count. As well as the one about floats in CCXT.

eprbell · 2022-10-04T13:32:08Z

Unfortunately I cannot :-(

When you look at your comments do they show up as "pending"? If so you need to submit them to make them publicly visible. Not sure what else this could be, short of a site malfunction...

macanudo527

Ok, now, you should see all comments, right?

src/dali/abstract_ccxt_input_plugin.py

macanudo527 · 2022-09-26T12:56:44Z

src/dali/abstract_ccxt_input_plugin.py

+        try:
+            while True:
+                pagination_details: PaginationDetails = next(pagination_detail_iterator)
+                trades: Optional[List[Dict[str, Union[str, float]]]] = self._client.fetch_my_trades(


Yes, unfortunately.

src/dali/abstract_ccxt_input_plugin.py

macanudo527 · 2022-10-02T13:37:55Z

src/dali/abstract_ccxt_input_plugin.py

+        self._client: Exchange = self._initialize_client()
+        self._thread_count = thread_count if thread_count else self.__DEFAULT_THREAD_COUNT


Yeah, unfortunately we keep going around and around on this. I guess I understand the basic concept, but how does this look when implemented? So for client and logger, I should have double underscore:

self.__client self.__thread_count

And then, property accessors:

@property def client(self): return self.__client @property def thread_count(self): return self.__thread_count

Or do I use _get accessors?

(These fields are used in the subclasses.)

macanudo527 · 2022-10-02T13:46:53Z

src/dali/abstract_ccxt_input_plugin.py

+        try:
+            while True:
+                pagination_details: PaginationDetails = next(pagination_detail_iterator)
+                trades: Optional[List[Dict[str, Union[str, float]]]] = self._client.fetch_my_trades(


Yes, unfortunately. CCXT outputs floats for some reason. It also accepts floats for its other functions for amounts.

Huh, so you don't see my other comment on this? That's so bizarre, it's like Github made them all disappear for you.

src/dali/abstract_ccxt_input_plugin.py

macanudo527 · 2022-10-04T14:19:32Z

Ok, now, everything should show. I have no idea how I have used Github for so long and not realized you have to submit all your comments. I have no idea what I was doing before.

Sorry about that.

eprbell · 2022-10-04T20:11:06Z

Now I see it: mystery solved :-) Will take a look in the next 2-3 days (currently traveling).

eprbell

OK, I think I replied to all of the comments. Let me know if anything is still missing.

src/dali/abstract_ccxt_input_plugin.py

eprbell

		self._client: Exchange = self._initialize_client()
		self._thread_count = thread_count if thread_count else self.__DEFAULT_THREAD_COUNT

Ccxt Input Abstract Plugin #80

Ccxt Input Abstract Plugin #80

Conversation

macanudo527 commented Sep 4, 2022 • edited Loading

eprbell left a comment • edited Loading

Choose a reason for hiding this comment

macanudo527 commented Sep 5, 2022

eprbell commented Sep 5, 2022 • edited Loading

macanudo527 commented Sep 5, 2022

eprbell commented Sep 5, 2022

macanudo527 commented Sep 6, 2022

eprbell commented Sep 6, 2022

macanudo527 commented Sep 7, 2022

eprbell commented Sep 7, 2022

macanudo527 commented Sep 8, 2022

eprbell commented Sep 8, 2022

macanudo527 commented Sep 11, 2022

macanudo527 commented Sep 12, 2022

macanudo527 commented Sep 13, 2022

eprbell commented Sep 13, 2022

eprbell left a comment

Choose a reason for hiding this comment

macanudo527 commented Sep 28, 2022 • edited Loading

macanudo527 commented Sep 29, 2022 • edited by eprbell Loading

eprbell commented Sep 29, 2022

macanudo527 commented Sep 29, 2022

eprbell commented Sep 30, 2022 • edited Loading

eprbell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macanudo527 commented Oct 3, 2022

eprbell commented Oct 4, 2022

macanudo527 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macanudo527 commented Oct 4, 2022

eprbell commented Oct 4, 2022

eprbell left a comment

Choose a reason for hiding this comment

eprbell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macanudo527 commented Oct 6, 2022

eprbell left a comment

Choose a reason for hiding this comment

eprbell left a comment

Choose a reason for hiding this comment

eprbell commented Oct 7, 2022

macanudo527 commented Oct 8, 2022

macanudo527 commented Sep 4, 2022 •

edited

Loading

eprbell left a comment •

edited

Loading

eprbell commented Sep 5, 2022 •

edited

Loading

macanudo527 commented Sep 28, 2022 •

edited

Loading

macanudo527 commented Sep 29, 2022 •

edited by eprbell

Loading

eprbell commented Sep 30, 2022 •

edited

Loading