[Enhancement] Increase number of btrfs quotas rescan #1624

MFlyer · 2017-01-23T23:51:11Z

While checking over shares usage found we perform quota rescan only while creating snapshots, so if users don't have scheduled snapshots (obviously we hope users have them!) or massively delete them, we can have wrong reporting over shares (quota limits in the future: 2015/* qgroups easily reaching quota limit )

Option A:
add quota rescans when deleting snapshots too

Option B:
make it a supervisored task every x (5-10? mins)

@schakrava & @phillxnet ?
While testing shares usage and deleting some snapshots (no scheduled snapshots) my 2015/* was nicely going up while not expected (when deleting snaps with exclusive sizes, 2015/* being their "father" takes those orphans and seems at least 2 rescan required to come back to real values)

phillxnet · 2017-01-24T11:28:47Z

@MFlyer Another nice find on your part.

My concern with solutions akin to Option B: (periodic re-scans) is that it will break drive power down features. If it turns out we can't avoid this in order to maintain recent info then maybe we could use the existing drive power state to guide these updates; ie akin to what smartmontools does where it can not query a drive if it is in standby but will only 'be nice' for a set number of attempts. There after it will go ahead and wake the drive in the interests of ensuring a recent read of the drives status.

Also I don't see that these options are mutually exclusive and Option A seems like a good idea anyway.

Do we have to account for quota re-scans potentially taking a long time (ie when there are many snapshots).

Apologies if I have missed the point there.

MFlyer · 2017-02-01T08:45:50Z

@phillxnet from btrfs changelog https://btrfs.wiki.kernel.org/index.php/Changelog#btrfs-progs_v4.9.1_.28Jan_2017.29

btrfs-progs v4.9.1 (Jan 2017)

check:

use correct inode number for lost+found files
lowmem mode: fix false alert on dropped leaf

size reports: negative numbers might appear in size reports during device deletes (previously in EiB units)
mkfs: print device being trimmed
defrag: v1 ioctl support dropped
quota: print message before starting to wait for rescan
qgroup show: new option to sync before printing the stats
other:

corrupt-block enhancements
backtrace and co. cleanups
doc fixes

Migrating to >=4.9 we can avoid additional rescan task and have btrfs rescans while updating shares usage (once every min)

MFlyer · 2017-02-01T10:09:31Z

Ref to @schakrava too, hands up for update to latest btrfs (pls remember my tests with 4.9 working fine)

Mirko

phillxnet · 2017-02-01T10:24:56Z

@MFlyer Yes I saw that one come up and meant to pop it in here for context. We still have the suspend issue for drives of course. Where is the 'every min' element enforced? Maybe we can have this configurable (if on our side) with a link to drive power down if relevant.

Thumbs up for btrfs-progs update on my part as only way to go really, especially given your recent findings re issues with size reporting on our current version. Should we also not have our kernel updated to be at least of 4.9 version also (elrepo ml now has 4.9.6-1)? My understanding is that it is best to keep kernel version and btrfs-progs as close as we can.

MFlyer · 2017-02-01T10:46:55Z

We still have the suspend issue for drives of course. Where is the 'every min' element enforced? Maybe we can have this configurable (if on our side) with a link to drive power down if relevant.

Every min taks is under data_collector over refresh-share-state:

    def update_storage_state(self):
        # update storage state once a minute as long as
        # there is a client connected.
        while self.start:
            resources = [{'url': 'disks/scan',
                          'success': 'Disk state updated successfully',
                          'error': 'Failed to update disk state.'},
                         {'url': 'commands/refresh-pool-state',
                          'success': 'Pool state updated successfully',
                          'error': 'Failed to update pool state.'},
                         {'url': 'commands/refresh-share-state',
                          'success': 'Share state updated successfully',
                          'error': 'Failed to update share state.'},
                         {'url': 'commands/refresh-snapshot-state',
                          'success': 'Snapshot state updated successfully',
                          'error': 'Failed to update snapshot state.'}, ]
            for r in resources:
                try:
                    self.aw.api_call(r['url'], data=None, calltype='post',
                                     save_error=False)
                except Exception as e:
                    logger.error('%s. exception: %s'
                                 % (r['error'], e.__str__()))
            gevent.sleep(60)

We can link refresh-share-state to drives power down (ex. run every min with a conditional sync every x - 10?20?30? - mins)

Totally agree with you kernel & btrfs tools working together on 4.9 and same on future releases

Mirko

MFlyer · 2017-02-01T14:19:34Z

Hi @phillxnet , amending my last one, didn't think about data collector nature, check this code:

Note: every RockstorIO obj under datacollector is a namespace attached to Rockstor socket.io implementation, so on SysinfoNamespace (obj handling shares and pools status too) we perform btrfs operations only with a client connected to Rockstor WebUI (while start is True) and stop them asa clients disconnect, this granting a btrfs rescan only when someone is checking via WebUI. Can we accept this? :)

Mirko

class SysinfoNamespace(RockstorIO):

    start = False
    supported_kernel = settings.SUPPORTED_KERNEL_VERSION

    # This function is run once on every connection
    def on_connect(self, sid, environ):

        self.aw = APIWrapper()
        self.emit('connected',
                  {
                      'key': 'sysinfo:connected',
                      'data': 'connected'
                  })
        self.start = True
        self.spawn(self.update_storage_state, sid)
        self.spawn(self.update_check, sid)
        self.spawn(self.update_rockons, sid)
        self.spawn(self.send_kernel_info, sid)
        self.spawn(self.prune_logs, sid)
        self.spawn(self.send_localtime, sid)
        self.spawn(self.send_uptime, sid)

    # Run on every disconnect
    def on_disconnect(self, sid):

        self.cleanup(sid)
        self.start = False

    def send_uptime(self):
        # Seems redundant
        while self.start:
            self.emit('uptime', {'key': 'sysinfo:uptime', 'data': uptime()})
            gevent.sleep(60)

    def send_localtime(self):

        while self.start:

            self.emit('localtime',
                      {
                          'key': 'sysinfo:localtime',
                          'data': time.strftime('%H:%M (%z %Z)')
                      })
            gevent.sleep(40)

    def send_kernel_info(self):

            try:
                self.emit('kernel_info',
                          {
                              'key': 'sysinfo:kernel_info',
                              'data': kernel_info(self.supported_kernel)
                          })
            except Exception as e:
                logger.error('Exception while gathering kernel info: %s' %
                             e.__str__())
                # Emit an event to the front end to capture error report
                self.emit('kernel_error', {
                    'key': 'sysinfo:kernel_error', 'data': str(e)})
                self.error('unsupported_kernel', str(e))

    def update_rockons(self):

        try:
            self.aw.api_call('rockons/update', data=None, calltype='post',
                             save_error=False)
        except Exception as e:
            logger.error('failed to update Rock-on metadata. low-level '
                         'exception: %s' % e.__str__())

    def update_storage_state(self):
        # update storage state once a minute as long as
        # there is a client connected.
        while self.start:
            resources = [{'url': 'disks/scan',
                          'success': 'Disk state updated successfully',
                          'error': 'Failed to update disk state.'},
                         {'url': 'commands/refresh-pool-state',
                          'success': 'Pool state updated successfully',
                          'error': 'Failed to update pool state.'},
                         {'url': 'commands/refresh-share-state',
                          'success': 'Share state updated successfully',
                          'error': 'Failed to update share state.'},
                         {'url': 'commands/refresh-snapshot-state',
                          'success': 'Snapshot state updated successfully',
                          'error': 'Failed to update snapshot state.'}, ]
            for r in resources:
                try:
                    self.aw.api_call(r['url'], data=None, calltype='post',
                                     save_error=False)
                except Exception as e:
                    logger.error('%s. exception: %s'
                                 % (r['error'], e.__str__()))
            gevent.sleep(60)

    def update_check(self):

        uinfo = update_check()
        self.emit('software_update',
                  {
                      'key': 'sysinfo:software_update',
                      'data': uinfo
                  })

    def prune_logs(self):

        while self.start:
            self.aw.api_call('sm/tasks/log/prune', data=None, calltype='post',
                             save_error=False)
            gevent.sleep(3600)

Alternative/enhancement : while users connected have update_storage_state with current 60 secs sleep, while users disconnected perform it anyway (actually we don't do that!) but every 30/60/120mins? to grant fs status updates

MFlyer mentioned this issue Jan 23, 2017

Share Usage reporting right sizes #1625

Merged

schakrava added this to the Point Bonita milestone Mar 24, 2017

schakrava modified the milestones: Point Bonita, After Six Nov 7, 2017

phillxnet removed this from the After Six milestone Jan 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Increase number of btrfs quotas rescan #1624

[Enhancement] Increase number of btrfs quotas rescan #1624

MFlyer commented Jan 23, 2017

phillxnet commented Jan 24, 2017

MFlyer commented Feb 1, 2017 •

edited

MFlyer commented Feb 1, 2017

phillxnet commented Feb 1, 2017

MFlyer commented Feb 1, 2017

MFlyer commented Feb 1, 2017 •

edited

[Enhancement] Increase number of btrfs quotas rescan #1624

[Enhancement] Increase number of btrfs quotas rescan #1624

Comments

MFlyer commented Jan 23, 2017

phillxnet commented Jan 24, 2017

MFlyer commented Feb 1, 2017 • edited

MFlyer commented Feb 1, 2017

phillxnet commented Feb 1, 2017

MFlyer commented Feb 1, 2017

MFlyer commented Feb 1, 2017 • edited

MFlyer commented Feb 1, 2017 •

edited

MFlyer commented Feb 1, 2017 •

edited