Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic][PdrDashboard] Analytics for predictoors #1328

Open
2 of 4 tasks
trentmc opened this issue Oct 10, 2023 · 17 comments
Open
2 of 4 tasks

[Epic][PdrDashboard] Analytics for predictoors #1328

trentmc opened this issue Oct 10, 2023 · 17 comments

Comments

@trentmc
Copy link
Member

trentmc commented Oct 10, 2023

Background / motivation

Our core users are predictoors & traders who use our pdr-backend python bots. We want to reduce friction for them.

Even though they operate largely in python-land, there are things we can do in the webapp to help them out.

Top (or near top) of the list is to help them answer the Q: "How much $ am I making / losing". There are many drill-down Q's that emerge from that.

This epic is for predictoors. We will do a follow-on epic for traders, when appropriate.

TODOs: Predictoor Dashboard

  • Explore top-level Q's, and drill-down Q's. Done: this comment. And other comments too! :)
  • Prototype visuals. Done: here (Gslides)
  • Build dashboard, iteratively
  • (When appropriate) create similar epic for trader dashboard. Link to this one.

Related

  • pdr-web#48 is "[Predictoor FE] Backlog". It has some analytics-related issues. This issue needs to reconcile with that.
  • pdr-web#66 is "Predictoors Leaderboard". It's likely a subset of this, but useful to keep open for now because we'll almost certainly want to highlight the leaderboard; and we want to ensure that it doesn't get missed
  • (closed - duplicate) pdr-private#38 is "Fetch, visualize and monitor prediction metrics". There was overlap between that issue and this one. So we closed pdr-private#38.
@KatunaNorbert
Copy link
Member

Predictoor Q’s:
What was the total available rewards for the last x hours/x days?
How many right/wrong predictions I had in the last x hours/x days?
How many tokens I won/lost in the last x hours/x days predictions?
How much I’m making compared to other predictoors?
How much revenue comes in via sails?

Trader Q’s:
How much did I spend on subscriptions in the last week?
Which assets have the most sales?

@trentmc trentmc changed the title UX+: Analytics for traders, to answer "How much $ am I making" UX+: Analytics for predictoors & traders, to answer "How much $ am I making" Oct 12, 2023
@trentmc
Copy link
Member Author

trentmc commented Oct 12, 2023

Predictoor Q's:

  • What's net income across all predictoors in the previous 24h? 7d? Where net income = (gross revenue) - (costs)
    • Slice & dice: like gross revenue (see below)
    • Also include: % gains over shown period, and equivalent APY (ie % gains annualized)
    • Can ignore "slice & dice by contributions" since "gross revenue" & "cost" contributions cover it
  • What's gross revenue across all predictoors in the previous 24h? 7d? As a lump sum (scalar value), and value vs time (plot).
    • Slice & dice: value (gross revenue) across my predictoors? (There may be >=1 address). Versus value across all other predictors.
    • Slice & dice: value across specific predictoors? (E.g. the 3 with highest net income) (=Generalize "my")
    • Slice & dice: value contribution per revenue component? (E.g. non-DF sales, DF sales, stake winnings)
    • Slice & dice: value for specific trading venues, or pairs, or timescales; value for specific data nft (asset)
    • Slice & dice: OCEAN component, ROSE component
    • Slice & dice: rank the predictoors by: APY, highest net income first, highest revenue first, lowest cost first
    • Slice & dice: rank the assets by: highest sales first, highest net income first, highest revenue first
  • What's costs across all predictoors in the previous 24h? 7d?
    • Slice & dice: like gross revenue
    • For cost contributions, components will be: stake slashings, tx fees, $ to buy prediction feeds)
    • "Slice & dice: rank the predictoors by highest net income first" == predictoors leaderboard. We may wish to highlight this. Ref pdr-web#66 "Predictoors leaderboard"
  • What's prediction & accuracy across all predictoors in the previous 24h? 7d?
    • Slice & dice: what was the predicted value? True value? Times when predicted == True
    • Slice & dice: what was total # OCEAN staked? # OCEAN up? # OCEAN down?
    • Slice & dice: what was total # predictions? (Over the whole time period, and per epoch if possible)
    • Slice & dice: like gross revenue. With specific twists for accuracy
  • I'd like to monitor bots uptime, and debug as needed
    • Possible bots: trueval, dfbuyer, my predictoors, all predictoors (where possible)
    • For each bot: what are its logs? Can I download them?
    • For each bot: what epochs did it "do its thing"? What was $OCEAN spent, $received, net $? $ROSE? Total in USD?
  • What does the model itself look like?
    • Inspiration: this is unlike the ones above. Rather, it's shaped a lot like FDS. See "inspiration from Trent's prev work" below.

What I envision for rendering this data:

  • One or many plots of value vs time, taking up the right 2/3 of window
  • With >=1 widgets on the left 1/3 of window to do filtering
  • And complementary tables. TBD whether part of the plots, or separately (need to prototype)

Slides:

@trentmc trentmc changed the title UX+: Analytics for predictoors & traders, to answer "How much $ am I making" UX+: Analytics for predictoors & traders, to answer "How much $ am I making" then drill in Oct 12, 2023
@trentmc
Copy link
Member Author

trentmc commented Oct 16, 2023

Update: I've done some first cut pencil-and-paper prototypes in the 2023 10 FE Prototypes GSlides. I'm sharing early to communicate how I'm thinking about this. It's a CAD tool for predictoors and traders! :)

I don't plan to spend more time at this right now. I will only when this overall issue becomes a priority, which might be soon and might not be soon.

@idiom-bytes
Copy link
Member

idiom-bytes commented Oct 20, 2023

Hi, as we need to implement accuracy calculations for the FE (2000 samples), I recommended @kdetry to start thinking of how to build this:

  1. inside pdr-backend
  2. using python

High level flow of how this might get used:

  • pdr-analytics will compile all of this into a data-frame, aggregate, pre-compile, that can be served via an API endpoint rather than JIT via subgraph
  • pdr-analytics api/py will take an optional [list of feed addresses] to return individual or batch results
  • pdr-analytics api/py will be served to both FE clients, in addition to pdr-backend agents. FE can access via API while Agents can access directly through a .py function, or through their local/remote API.
  • pdr-trader will verify that the rolling accuracy on a feed is good in addition to having decent stake
  • pdr-trader may want to look at acc_last_2000, acc_last_500, and acc_last_50 samples for a particular feed before trading

@trentmc
Copy link
Member Author

trentmc commented Oct 21, 2023

When I presented the prototypes this past Thursday, I described how we can evolve from something super-simple to a high-quality webapp.

Here I flesh it out, as practical as possible, pointing to code that exists and that can be evolved.

The spectrum, from simplest first:

  1. Status quo. In pdr_backend "simulation flow" (pdr_backend/predictoor/approach3), locally generate & show matplotlib plots at the end of the sim. Show profit vs. time for traders, predicted vs actual for predictoors, etc. We already have a first-cut of this, and will improve organically. Eg #272 profit vs time for predictoors.
  2. Add realtime. In pdr_backend "simulation flow", locally generate & show matplotlib plots in realtime as the simulation progresses. pdr-backend#279
  3. Put in bot flow. In pdr_backend "predictoor bot flow" & "trader bot flow", for local bots, the bot itself generates & shows matplotlib plots in realtime. pdr-backend#280
  4. Plot from different process. In pdr_backend "bot flows", for local bots, a separate local process grabs chain data then generates & shows matplotlib plots in realtime. The separate process is a new directory pdr_backend/analytics. It uses subgraph query to grab chain data. Bonus: this allows bots to be run remotely too. pdr-backend#281
  5. Put in webapp. In pdr_backend "bot flows", for local or remote bots, a separate local pdr_backend/analytics process grabs chain data, then generates & renders pythonic plots into a webapp via streamlit or dash. The analytics process serves up an API consumed by webapp. pdr-backend#282
  6. Remote analytics service. The analytics process API is run as a remote web service.

We have (1). We can do a "tracer bullet" starting with (2) and going all the way through (6). Then we can continually flesh out plots at the level of (1-2: simulation flow), and as they mature we pull them into (4-6: analytics service).

Update: I converted (2-5) to github issues, linked above. And all this work is part of a new issue: "[EPIC] [Simulation, bots] Easy-to-use & powerful simulation --> predictoor/trader bot flow" pdr-backend#278

@trentmc
Copy link
Member Author

trentmc commented Oct 21, 2023

@idiom-bytes wrt your comment of what @kdetry can be doing: it's really describing an architecture for step (6) in my comment above -- "remote analytics service"

Rather than directly jumping to (6), it might be wise to go through steps (2) - (5) first, tracer bullet style. This will ensure that we have a pipeline from rough prototypes (steps 1,2) all the way through to production remote analytics service (step 6).

Thoughts?

@idiom-bytes
Copy link
Member

idiom-bytes commented Oct 23, 2023

2000-sample Accuracy

I generally agree that the analytics system could be responsible for helping to execute the data/graphic work across all 6-features. However, since some of these are already working, and we want to onboard others, it might be easier to onboard with small tasks from (6) and then bring other existing workflows (1) over to this module.

Example: (6) right now is really small. It just needs 2k sample accuracy for 2x timeframes, so:

  1. leverage approach3/data_factory.py to do all the checkpointing/fetching
  2. pull all prediction results from subgraph
  3. calculate accuracy_5m, accuracy_1h
  4. serve these via flask/gunicorn
  5. serve these via the python module
  6. update pdr-web to fetch the accuracy from the server

How:

There is a script named data_factory.py which does some nice work to maintain a checkpoint of how much it has downloaded so far. I imagine that data_factory and some of the work that trent has done so far, would benefit from being abstracted and moved into something more general like /utils/ so other systems can use it.

Inside pdr-backend, you should be able to just import/instantiate/conigure a data_factory, and start using it.

Server vs. Local

The server:

  • Can fetch from subgraph using data_factory, and then serve the results from disk.
  • remote pdr-web: should be able to get this data by fetching from a remote server deployed by pdr-backend.

The bot:

  • Can do the same, but it will be done internally/local rather than through a remote call.
  • bot/pdr-trader or another local service should be able to get this data by importing the python module internally, and fetching data from it's own local disk.

Further updates

I also propose data_factory gets an update to use polars + parquet to do this. It's incredibly fast, and will enable us to grow

@trentmc
Copy link
Member Author

trentmc commented Oct 23, 2023

Thanks for the thoughts @idiom-bytes .

OK to a small (6) now, for the 2K thing. (Via the 2K github issue.)

Please don't use approach3/data_factory.py for that. It has completely student goals.

(Fyi I have a github issue to move the simulator stuff from approach3 directory to a more general place. That too is outside the scope of 2K work. And I want to do it when I get back because I know exactly what I want, and how to do it. So please don't do it in the meantime. Focus on 2K.)

@KatunaNorbert
Copy link
Member

For (6) it's not clear for me, when talking about serving the accuracy analytics data via server do you mean using pdr-backend or a different service?
If it's a different service I propose that we use pdr-websocket.

@trentmc
Copy link
Member Author

trentmc commented Oct 24, 2023

For (6) it's a different service. Not pdr-backend.

I don't rule out pdr-websocket. I defer to you (Norbert) and Mustafa and Roberto.

@idiom-bytes
Copy link
Member

idiom-bytes commented Oct 25, 2023

[WRT pdr-websockets]
This is just for having a pk that can talk to the contract + w/o exposing to the client.
Which has been leading to all sorts of maintenance issues.

[WRT Websockets Forwardlooking]
pdr-web + pdr-websockets should be nearly-frozen for now. pdr-ws has been nightmarish to support, a lot of code is getting duplicated/fragmented against pdr-web. Rather than building a pdr-fe-util lib to start addressing some of this problem... I think there is a solution to tech spike that would reduce this complexity by an order of magnitude.

How?

  1. Deploy next app as a UI client on vercel.
  2. Deploy the same next app as a pure server/backend w/PK on prod vm.
  3. Kill websocket and supporting another stack. It's now unnecessary (1)(2).
  4. (1) reads from (2)

(1) and (2) are deployed in separate environments but share the exact same stack. PK is not shown to client. We leverage more of next.js native functionality.

*** I have created Ticket oceanprotocol/pdr-web#283 in pdr-web to represent this

[WRT dApp/Predictoor Analytics (6)]
Based on trents feedback...
(A) I think leaderboards, epoch summaries, ecosystem metrics, and all sorts of things, should be written in python, in a clean module that is self-contained, atomic, and easy to import.
(B) Rather than querying for GQL each time. This system should dump all data from subgraph, and build summaries for everything. This will look like an etl workflow. Only fetch what's needed, and update the data. Think parquet + dataframes.
(C) As a pdr-trader, I'll want to query this system in addition to trained models that have obtained this data, as a way to understand other user behaviors, competitiveness across feeds, which ones are buying, and have high-level trading agents decide which feeds to use, or which predictoor feeds to submit to.
(D) If desired, in the future, this service could sit in front of a GQL provider
(E) As an app developer, I can easily query this data through remote/fetch/GET.
(F) As a builder in pdr-backend, I can import this module, run the etl locally, and query the local cache directly from the app during my epoch updates. Example: Copy trading from known predictoors that are incredibly accurate.
(G) As an ML engineer in pdr-backend, I can import the module, run the etl locally, and query the local cache directly to build my dataframes and features w/ behaviors from Predictoors.
(H) If desired, this module could be easily extended w/ a FE to take all metrics/graphs/etc, and serve it to streamlit/etc...

*** I have created Ticket oceanprotocol/pdr-web#284 in pdr-web to represent this

[Final Remarks]
pdr-websocket was primarily used to not expose a pk to the client.
pdr-web will get bloated and this code will never be re-used if it ends up in there.
This doesn't belong in pdr-websocket or pdr-web.
Do not write it in JS either.

All of our data science and knowledge is being written in py.
I want to be reading directly from the py stack.
Please view this a data problem, not an app problem.

@KatunaNorbert
Copy link
Member

Hey @trentmc, I was double checking on 'The spectrum, from simplest first' described above.
Looks like you are assigned to the first step can I start working on the second step? There is already a fist cut of first step available that I could use to move things forward.

@trentmc
Copy link
Member Author

trentmc commented Nov 10, 2023

Hey @trentmc, I was double checking on 'The spectrum, from simplest first' described above. Looks like you are assigned to the first step can I start working on the second step? There is already a fist cut of first step available that I could use to move things forward.

TBH I'd prefer to handle this myself, and the follow-up steps. I've finally got "Ship Predictoor DF" off my plate, and I intend to go through all these steps ASAP, and quickly. (Written as an EPIC in pdr-backend#278.)

FYI the "FE: backlog" column in DF/VE board has many items that could be covered.

@KatunaNorbert
Copy link
Member

Ok, sure, sounds good. I was kind of expecting that you are going to go trough this that's why I wanted to check.
Looks like Predictoor stats are a high priority now, since Predictoors are now able to make money showing this to community should help with incentivising people to onboard.

@KatunaNorbert
Copy link
Member

KatunaNorbert commented Nov 10, 2023

For example a Predictoor leaderboard on the UI displaying the top x predictoors with their returns and accuracy
I checked your prototype and is mainly focused on 'how much I make'. We might also want to have a section about: 'how much others make' so users ca see that they can make money before they start onboarding.
Oh, NVM, I see there is a page for Predictoors, where you can see information about other Predictoors.

@idiom-bytes
Copy link
Member

idiom-bytes commented Nov 16, 2023

Hey Norbert, we get all of this out-of-the-box if we have the data and streamlit setup in a certain way. Let's continue to write down questions + design dashboards, and then figure out the data pipeline + tables we need to serve all of this.

image

example w/ a bronze->silver->gold pipeline:
pdr_backend/data/gold/user_summary.parquet

  • user_address (str)
  • feed (str)
  • timeframe (str)
  • preds (int)
  • correct_preds (int)
  • acc (float)
  • earnings (float)
  • losses (float)
  • net (float)

On the streamlit side we can add a couple of dropdown and a text-field to serve the result:

  • Feed Dropdown -> Group/sort by feed (top predictoor per feed)
  • Timeframe Dropdown -> Group/sort by timeframe (top predictoor per feed)
  • Wallet List Text Field -> Group/sort by list of wallet addresses (1 whole user is a composite of multiple wallets)

@trentmc trentmc transferred this issue from oceanprotocol/pdr-web Nov 20, 2023
@trentmc trentmc transferred this issue from oceanprotocol/pdr-backend Nov 21, 2023
@trentmc trentmc changed the title UX+: Analytics for predictoors & traders, to answer "How much $ am I making" then drill in [Super Epic] Analytics for predictoors & traders, to answer "How much $ am I making" then drill in Nov 21, 2023
@trentmc
Copy link
Member Author

trentmc commented Nov 21, 2023

Architecture: from this Slack msg

Below is a design for analytics architecture, and its relation to pdr-backend. From a discussion among Berkay, Roberto, myself.

Have three separate repos:

  1. pdr-lake. Grabs on-chain data and CEX data, and stores as a "data lake" of parquet files. Continuously runs to update in real-time. To build the first cut of this, we'd move some of pdr_backend/data_eng/data_factory.py code here; as well some/all of pdr_backend/subgraph*.py. Probably also use "cryo" tool.
  2. pdr-analytics. Grabs data from the data lake by directly querying the parquet files (no REST API), and generates and renders interactive plots in the browser via matplotlib & streamlit.
  3. pdr-backend. Grabs data from the data lake, and supports the flows for simulation, pdr bot, pdr trader.

Usage:

  • Each of us would always be running a pdr-lake service locally, and filling our own local data lake.
  • Then each of us could run pdr-analytics app, or a pdr-backend app (sim, pdr bot, trader bot) which consumes data from the lake.

Not near term: Only once the above is stabilized and the analytics fleshed out nicely (2+ mos from now), we can...

  • Remove friction in filling the initial data lake. Eg have a dedicated github repo storing historical data, like paradigm has.
  • Make pdr-analytics renderings easy for people to see from predictoor.ai. Eg by running pdr-lake and pdr-analytics' backend in the cloud, with small hooks in pdr-web to render the output of pdr-analytics' backend.
  • But explicitly do not work on either of those now, because it will get in the way of rapid development of the data lake and the pdr-analytics plots.

Near-term order-of-dev-work:

  1. Finish YAML & CLI in pdr-backend [Trent, w Berkay]
  2. In pdr-backend, refactor csvs -> parquet, and pandas -> polars. [Roberto]
  3. First-cut pdr-lake repo. Move code as appropriate from pdr-backend. Get it all running, including where pdr-backend consumes data from the lake. Just csvs & pandas. Outcome: following pdr-backend READMEs now involves using pdr-lake repo too. [Trent, likely others]
  4. First-cut pdr-analytics repo. First outcome: simple first-cut plots being rendered in the browser. [Roberto, Norbert, Mustafa]
  5. Then, we can iterate iterate on pdr-backend, pdr-lake, and pdr-analytics in parallel. (And any breaking changes to pdr-lake need to get propagated into pdr-backend and pdr-analytics) [all]

One more thing: keep Mustafa's new service for the accuracy estimation in pdr-backend for now. (Avoid rocking the boat here for now. Revisit when we make pdr-analytics live on predictoor.ai)

@trentmc trentmc transferred this issue from oceanprotocol-archive/pdr-anl Jul 2, 2024
@trentmc trentmc changed the title [Super Epic] Analytics for predictoors & traders, to answer "How much $ am I making" then drill in [Epic][PdrDashboard] Analytics for predictoors Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants