Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Rewrite hadoop-ai regex to match gpu info #2681

Merged
merged 2 commits into from
May 7, 2019
Merged

Conversation

mzmssg
Copy link
Member

@mzmssg mzmssg commented Apr 26, 2019

TODO:
Accquire structural gpu info by nvidia-smi -x

@mzmssg mzmssg requested a review from abuccts April 26, 2019 11:28
@coveralls
Copy link

Coverage Status

Coverage remained the same at 53.255% when pulling 1471e7b on zimiao/rewrite_regx into b232486 on master.

@@ -16,7 +16,7 @@ index 8801b4a940f..30d33086516 100644
*/
Pattern GPU_INFO_FORMAT =
- Pattern.compile("\\s+([0-9]{1,2})\\s+[\\s\\S]*\\s+(0|1|N/A|Off)\\s+");
+ Pattern.compile("\\s+([0-9]{1,2})\\s+[\\s\\S]*\\s+(\\d+|N/A|Off)\\s+");
+ Pattern.compile("[|]\\s+([0-9]{1,2})[^|]*[|][^|]*[|]\\s+(\\d+|N/A|Off)\\s+[|]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please parse the structured output instead, e.g.

  • for gpu memory, use nvidia-smi -q -d MEMORY
Attached GPUs                       : 16
GPU 00000000:34:00.0
    FB Memory Usage
        Total                       : 32480 MiB
        Used                        : 0 MiB
        Free                        : 32480 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 2 MiB
        Free                        : 32766 MiB
  • for gpu ecc, use nvidia-smi -q -d ECC
Attached GPUs                       : 16
GPU 00000000:34:00.0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0

Or use the xml output. Otherwise, the changes are useless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mitigation, we could create a issue for the todo items and evaluate its priority.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, we havn't found a detailed explanation about the structural output, similar issue in #2534

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find the output details in nvidia-smi docs.

nvidia-smi is a part of NVIDIA System Management Interface (NVML), and there's also an Python bindings, which is backwards compatible for the NVML. It's better to use nvidia management library API to query the status.

@mzmssg mzmssg merged commit 0c61d64 into master May 7, 2019
@abuccts abuccts deleted the zimiao/rewrite_regx branch May 14, 2019 09:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants