Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During meta startup, leader may not register itself #30

Open
atellwu opened this issue May 29, 2019 · 0 comments
Open

During meta startup, leader may not register itself #30

atellwu opened this issue May 29, 2019 · 0 comments
Assignees
Labels
bug

Comments

@atellwu
Copy link

@atellwu atellwu commented May 29, 2019

Describe the bug

  • During meta startup, leader may not register itself
    • In the startProcess of setLeaderProcessListener, if sendNotify throw an exception, then registerCurrentNode() will not execute. This will result in incomplete metaLists for data and session, lack of leader ip, and subsequent data will fail to start.
atellwu pushed a commit to atellwu/sofa-registry that referenced this issue May 29, 2019
@atellwu atellwu self-assigned this May 29, 2019
@atellwu atellwu added the bug label May 29, 2019
Synex-wh added a commit that referenced this issue Sep 26, 2019
* fix temp push

* update version 5.2.1-SNAPSHOT

* fix test case

* fix jetty version,and fix rest api for dataInfoIds

* fix hashcode test

* fix working to init bug

* fix start task log

* fix Watcher can't get providate data,retry and finally return new

* add data server list api

* add server list api

* remove log

* fix isssue 21

* add query by id function

* fix issue 22

* delay client off process and sync data process to working status

* fix data connet meta error

* fix inject NotifyDataSyncHandler

* fix start log

* add send sub log

* fix subscriber to send log

* bugfix: #27

* bugfix: #27

* feature: Add monitoring logs #29

* feature: Add monitoring logs #29
(1) bugfix CommonResponse
(2) format

* bugfix: During meta startup, leader may not register itself #30

* bugfix: Sometimes receive "Not leader" response from leader in OnStartingFollowing() #31

* temp add

* add renew request

* data snapshot module

* add calculate digest service

* fix word cache clientid

* data renew module

* data renew/expired module

* add renew datuem request

* add WriteDataAcceptor

* session renew/expired module

* 1. bugfix ReNewDatumHandler: getByConnectId -> getOwnByConnectId
2. reactor DatumCache from static to instance

* add blacklist wrapper and filter

* upgrade jraft version to 1.2.5

* blacklist ut

* add clientoff delay time

* bugfix: The timing of snapshot construction is not right

* rename: ReNew -> Renew

* fix blacklist test case

* rename: unpub -> unPub

* add threadSize and queueSize limit

* bugfix: revert SessionRegistry

* fix sub fetch retry all error,and reset datainfoid version

* fix client fast chain breakage data can not be cleaned up”

* (1) remove logback.xml DEBUG level;
(2) dataServerBootstrapConfig rename;
(3) print conf when startup

* update log

* fix update zero version,and fix log

* add clientOffDelayMs default value

* fix clientOffDelayMs

* Task(DatumSnapshot/Pub/UnPub) add retry strategy

* bugfix DataNodeServiceImpl: retryTimes

* (1)cancelDataTaskListener duplicate
(2)bugfix DataNodeServiceImpl and SessionRegistry

* refactor datum version

* add hessian black list

* bugfix: log "retryTimes"

* bugfix DatumLeaseManager:  Consider the situation of connectId lose after data restart; ownConnectId should calculate dynamically

* add jvm blacklist api

* fix file name

* some code optimization

* data:refactor snapshot

* fix jetty version

* bugfix DatumLeaseManager: If in a non-working state, cannot clean up because the renew request cannot be received at this time.

* remove SessionSerialFilterResource

* WriteDataProcessor add TaskEvent log; Cache print task update

* data bugfix: snapshot must notify session

* fix SubscriberPushEmptyTask default implement

* merge new

* fix protect

* 1. When the pub of connectId is 0, no clearance action is triggered.
2. Print map. size regularly
3. Delete the log: "ConnectId (% s) expired, lastRenewTime is% s, pub. size is 0"

* DataNodeExchanger: print but ignore if from renew module, cause renew request is too much

* reduce log of renew

* data bugfix: Data coverage is also allowed when versions are equal. Consistent with session design.

* DatumCache bugfix: Index coverage should be updated after pubMap update

* DatumSnapshotHandler: limit print; do not call dataChangeEventCenter.onChange if no diff

* bugfix unpub npe (pub maybe already clean by DatumLeaseManager);LIMITED_LIST_SIZE_FOR_PRINT change to 30

* some code refactor

* add code comment

* fix data working to init,and fix empty push version

* consider unpub is isWriteRequest, Reduce Snapshot frequency

* RefreshUpdateTime is at the top, otherwise multiple snapshot can be issued concurrently

* update config: reduce retryTimes, increase delayTime, the purpose is to reduce performance consumption

* put resume() in finally code block, avoid lock leak

* modify renewDatumWheelTaskDelay and datumTimeToLiveSec

* When session receives a connection and generates renew tasks, it randomly delays different times to avoid everyone launching renew at the same time.

* data: add executor for handler
session: bugfix snapshot
session: refactor wheelTimer of renew to add executor

* add get data log

* snapshot and lastUpdateTimestamp: Specific to dataServerIP

* 1. DataServer: RenewDatumHandler must return GenericResponse but not CommonResponse, or else session will class cast exception
2. No need to update timestamp after renew
3. snapshot: Need to specify DataServerIP

* add logs

* 1. dataServer: reduce log of snapshotHandler
2. update logs

* dataServer: renew logic should delay for some time after status is WORKING, cause Data is processed asynchronously after synchronization from other DataServer

* bugfix bean; update log

* ignore renew request log

* fix UT

* fix .travis.yml

* fix version 5.3.0-SNAPSHOT

* fix online notify connect error

* fix push confirm error,and fix datum update version,pub threadpool config,add accesslimit service

* add switch renew and expire

* implement renew enable/disable switch

* fix data client exechange log

* fix datum fetch connect error

* bugfix CacheService: set version zero when first sub and get datum error

* fix clean task for fetch

* bugfix DatumCache: Forget to clean up the index in datumCache.putSnapshot

* fix fetch datum word cache

* fix test case time

* fix test cast

* fix test case

* fix tast case

* fix ut case: StopPushDataSwitchTest

* ut case:renew module

* fix ut case:TempPublisherTest

* bugfix ut case: increase sleep time

* fix ut case:RenewTest

* fix ut case:RenewTest format

* fix pom version

* fix ut case:do not run parallelly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.