Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chaos Unicorn Meta Issue #7726

Open
oskarth opened this issue Mar 13, 2019 · 21 comments

Comments

@oskarth
Copy link
Member

commented Mar 13, 2019

April 1 we are going to shutdown our cluster and other MITM services, see https://chaos-unicorn-day.org/ for more. This issue serves as an umbrella issue to ensure we prepare minimal set of things to do this properly.

Please create additional issues as it makes sense and replace the brief description in this issue, or post a comment below.

Scope

  • When: April; 9am-9am UTC
  • Services impacted:
    a) Our cluster with bootnodes, mailservers, etc
    b) All HTTP services the app uses, including Infura and Etherscan.

Checklist

Cluster:

  • Shutdown whispernodes, mailserver, bootnodes during these hours (cc @jakubgs)
    • CI and sites like status.im not impacted
    • note that these nodes should be shut down even if you are running an old version of the app (no cheating!)
    • if I missed something, please mention it here
  • HTTP endpoint that app can query to know if Chaos Unicorn is active or not (can be either positive or negative signal, i.e. either 2xx or 4xx/5xx depending on which makes most sense from App POV)
    -- This can also be a gist or w/e

App:

  • Ensure Infura, Etherscan and similar services are blocked, e.g. using HTTP endpoint above (@mandrigin)
  • Time based warning that this is happening, e.g. a note with a link to https://chaos-unicorn-day.org/ (cc @mandrigin @hesterbruikman)
  • This app version needs to be released before Chaos Unicorn Day
  • (Optional: difficulty bomb so that Chaos Unicorn Day is periodically activated by default, e.g. every 3m, then increasingly more frequently)

Comms:

  • Blog post with short intro and re-iterate (@oskarth also @Swader if you are interested in helping out)
  • Tweet 7d / 1d in advance (@oskarth I can do something simple, but for that magic touch @j-zerah or @StatusSceptre might have some ideas)
  • Any other comms people can think of, e.g. Reddit helper thread (can see ad hoc)

Personal prep:

  • Guides for how to run and add friend's nodes (bootnodes, mailserver, VIPNode, dappnode) etc (cc @adambabik)
  • Anything else you can think of

Operationally:

  • Ensure we follow what's going on and can monitor/coordinate outside of Status (@corpetty )
    Where do we want to do this? Or leave it underspecified.

During and after:

  • construct timeline of response and reactions, for lessons learned
  • monitor and spread information
  • problem solve and try to get shit working with custom builds/other guides etc (all)

Misc

Notes from call https://notes.status.im/h37eocOyQei2j5THqUPC6A#

Acceptance criteria

This issue can be closed when:
(a) preparation has been done
(b) a link to post-mortem / lessons learned and follow up actions has been done

@oskarth

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2019

@Swader

This comment has been minimized.

Copy link
Contributor

commented Mar 13, 2019

I will write a tutorial on how to run your own always-on zero-cost Full Ethereum and Status node.

@annadanchenko

This comment has been minimized.

Copy link
Member

commented Mar 13, 2019

list of known issues to be fixed:
#7553 Error fx/defn expects a map of cofx as first argument... if try to delete custom bootnode
#7681 Warning about changing network appears only after logout and if you tap on "Connect" 2 times
#4935 Account is blocked by Ethereum node started incorrectly error if custom network url doesn't match chain
#7752 No error message if provide wrong mailserver or bootnode url

@mandrigin

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2019

Pivotal epic for that for Core Improvements team: https://www.pivotaltracker.com/epic/show/4218446

We will work on these tasks specifically next week (March 18th-22nd). We will pause all the other activities in Core Improvements for that week (except release testing).

@oskarth

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2019

Draft blog post https://our.status.im/p/3c33ad00-c5c5-4bbb-852e-2609f3a09995/ going live in 24h, please let me know if you have any feedback

@oskarth

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2019

@annadanchenko

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

@corpetty suggested to add a custom mailserver option in desktop before april 1. otherwise desktop will be useless any feedback @rachelhamlin @oskarth @vkjr ?

@Swader

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

@oskarth

This comment has been minimized.

Copy link
Member Author

commented Mar 21, 2019

@annadanchenko I think that's a good idea. @vkjr is this doable? Perhaps it is even not needed if we can merge the mobile UX in time.

@annadanchenko

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

@vkjr if we will use current desktop (not mobile UI one) then in addition to adding custom mailserver feature we also need to fix #7430 as all users will face it on chaos day

@siphiuel

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

I'll work on the mailserver fixes for Desktop.

@annadanchenko

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

I'll work on the mailserver fixes for Desktop.

@siphiuel to summarise expected fixes for Desktop:

  1. add possibility to add a custom mailserver and switch to it (also remove it)
  2. add possibility to enable bootnodes and add a custom bootnode (also remove it)
  3. fix #7430
@vkjr

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

@siphiuel, thanks for taking care about this!
(I'm on vacation till 30th March)

@j-zerah

This comment has been minimized.

Copy link

commented Mar 22, 2019

@oskarth hows this for some general comms leading up to the shut down:
https://notes.status.im/0CFcsVB-Ru2zzv2lzckUVg

@oskarth

This comment has been minimized.

Copy link
Member Author

commented Mar 25, 2019

@j-zerah looks great, perfect!

@rachelhamlin

This comment has been minimized.

Copy link
Member

commented Mar 27, 2019

@oskarth I just commented this in #312-janitors but probably better here:

Have we provided user-friendly specifics as to which features will break on Chaos Unicorn Day? The blog posts seem more geared toward advanced users in that they name services that will be shut down, but do not specify the product impact.

Can we add more laymen terms to the website and blog posts? e.g. You will not receive messages or see message history. Workaround: Add custom bootnode and mailserver.

@oskarth

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2019

@rachelhamlin The blog post was meant for all users, but perhaps some things could be made more explicit. I'd be a bit cautious of doing too much preparation/busy work as we want to see how things work in the real world when things break without forewarning, e.g. as a fire drill.

If we want to change some copy, I suggest editing it here directly: https://github.com/status-im/chaos-unicorn-day/edit/master/README.md If someone has some specific copy they also want to include in Discuss post or blog post, feel free to suggest and I can add it in.

@annadanchenko

This comment has been minimized.

@Swader

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

The post says users on mobile will get an error after attempting to send a tx on mobile 0.11+. This does not clarify whether or not the error is just a reporting error, or if the TX actually fails. I would assume it does not actually fail, as we have status-go on mobile for a reason (that reason being signing txs and sending them). Am I correct?

@oskarth

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2019

This was tracked in Pivotal Tracker etc. Chaos Unicorn Done, see lessons learned post https://our.status.im/chaos-unicorn-day-what-we-learned-by-breaking-status/

Follow ups tracked in retrospective and Wrike. Closing this one.

@oskarth oskarth closed this Apr 17, 2019

@jarradh jarradh reopened this May 11, 2019

@oskarth

This comment has been minimized.

Copy link
Member Author

commented May 13, 2019

@jarradh why did you reopen this issue? For follow ups they have been captured already in separate tasks, retro and Wrike. If we want to have it visible in GH I suggest we make a new meta issue as this is obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.