Code and repo audit #160

apohllo · 2023-11-24T13:21:21Z

README.md

I would suggest: NameGuard by NameHash as a title, current title might be confusing

The text was updated successfully, but these errors were encountered:

apohllo · 2023-11-24T15:07:35Z

A more context could be given in the readme, i.e. why namegaurd could be useful. Some potential use-cases would be valuable.

apohllo · 2023-11-24T15:08:12Z

In the impersonation section giving actual comparison between safe and unsafe names would be very valuable.

apohllo · 2023-11-24T15:11:47Z

API/README.md
I tried running

pip install nameguard

and got:

ERROR: Could not find a version that satisfies the requirement nameguard (from versions: none)
ERROR: No matching distribution found for nameguard

My env is as follows:
Python: 3.11.3

apohllo · 2023-11-24T15:13:20Z

apohllo · 2023-11-24T15:15:42Z

If the pypi package is not yet available, maybe extend the README to include instruction on how to install the lib currently, using poetry.

apohllo · 2023-11-24T15:20:57Z

Add | python -m json.tool in the curl call - this will make the result much more readable.

apohllo · 2023-11-24T15:23:57Z

I don't get the difference between:

and

The messages are almost the same.

Ok, so one refers to the name as a whole, and the other to individual parts of the name. This should be reflected better in the message. Maybe the "label" part should talk about plural.

apohllo · 2023-11-24T15:24:53Z

This one is pretty obscure (at least looking at the message):

apohllo · 2023-11-24T15:34:48Z

At the end when running the non-monkey-patched version, I've got a bunch of errors:

I guess some additional service should be running on our machine:

This should be mentioned in the readme.

Update: there should be info, that if you get these errors, you should set up those API keys mentioned above.

apohllo · 2023-11-24T15:39:02Z

In the project definition file

nameguard/api/pyproject.toml

Lines 5 to 10 in daafc6a

    
           authors = ["NameHash Team <devops@namehash.io>"] 
        
           maintainers = ["NameHash Team <devops@namehash.io>"] 
        
           homepage = "https://github.com/namehash/nameguard" 
        
           repository = "https://github.com/namehash/nameguard" 
        
           readme = "README.md" 
        
           license = "LICENSE"

Are these emails maintained?
It would be better to provide emails to real people, not only the organization catch-all emails.
Homepage should be updated
License should be changed to MIT

apohllo · 2023-11-24T15:50:17Z

Why conf name is used here?

nameguard/api/nameguard/grapheme_normalization.py

Lines 5 to 7 in daafc6a

    
           def grapheme_is_normalized(conf: str) -> bool: 
        
               if len(conf) == 1 and ord(conf) in NORMALIZATION.valid: 
        
                   return True

Why not grapheme?

apohllo · 2023-11-30T17:08:15Z

I would suggest merging grapheme_normalization with generic_utils.

There's also utils file, which seems to have a similar purpose.

apohllo · 2023-11-30T17:09:30Z

Setting debug as the default log level does not seem to fit general audience

nameguard/api/nameguard/logging.py

Line 5 in daafc6a

logger.setLevel(logging.DEBUG)

apohllo · 2023-11-30T17:14:37Z

This regex is not resistant against new-line injection.

nameguard/api/nameguard/nameguard.py

Line 87 in daafc6a

ALCHEMY_UNKNOWN_NAME = re.compile('^\[0x[0-9a-f]{4}\.\.\.[0-9a-f]{4}\]\.eth$')

The anchors should be changed int \A and \z.

apohllo · 2023-11-30T17:18:31Z

Since the addresses have different meaning, why not use a dictionary?

nameguard/api/nameguard/nameguard.py

Lines 82 to 85 in daafc6a

    
           ens_contract_adresses = { 
        
               '0x57f1887a8bf19b14fc0df6fd9b2acc9af147ea85',  # Base Registrar 
        
               '0xd4416b13d2b3a9abae7acd5d6c2bbdbe25686401',  # Name Wrapper 
        
           }

apohllo · 2023-11-30T17:20:13Z

Consider missing key error, which would generate obscure error massage for a nested key:

nameguard/api/nameguard/nameguard.py

Lines 89 to 92 in daafc6a

    
           def nested_get(dic, keys): 
        
               for key in keys: 
        
                   dic = dic[key] 
        
               return dic

Maybe there should be error showing all elements of the keys list or a None value, if the path does not exist?

apohllo · 2023-11-30T17:22:37Z

Two-letter instance variable which is not documented:

nameguard/api/nameguard/nameguard.py

Line 100 in daafc6a

self.ns = {}

Hard to understand it's meaning from the name itself (name-service?).

self.services ora self.name_services would be a better name.

apohllo · 2023-11-30T17:28:33Z

Missing return value type:

nameguard/api/nameguard/nameguard.py

Lines 108 to 109 in daafc6a

    
           def analyse_label(self, label: str): 
        
               return self._inspector.analyse_label(label, simple_confusables=True, omit_cure=True)

apohllo · 2023-11-30T17:32:16Z

The doc says namehashes won't be analyzed, while in fact they might be resolved (according to the argument):

nameguard/api/nameguard/nameguard.py

Lines 112 to 116 in daafc6a

    
                   ''' 
        
                   Inspect a name. A name is a sequence of labels separated by dots. 
        
                   A label can be a labelhash or a string. 
        
                   If a labelhash is encountered, it will be treated as an unknown label. 
        
                   '''

Consider changing the doc.

apohllo · 2023-11-30T17:34:55Z

What about names such as ...?

nameguard/api/nameguard/nameguard.py

Line 123 in daafc6a

labels = [] if len(name) == 0 else name.split('.')

Shall the empty labels always be removed? Or for that case an empty string is ok?
If we allow empty labels in the call, why we don't do the same for an empty name?

apohllo · 2023-11-30T17:48:21Z

I would suggest changing label_analysis into label

nameguard/api/nameguard/nameguard.py

Lines 136 to 143 in daafc6a

    
           labels_graphemes_checks = [ 
        
               [ 
        
                   [check(grapheme) for check in GRAPHEME_CHECKS] 
        
                   for grapheme in label_analysis.graphemes 
        
               ] if label_analysis is not None else [] 
        
               # label has [] graphemes if it's a labelhash 
        
               for label_analysis in labels_analysis 
        
           ]

Currently there's little difference between label_analysis and labels_analysis which makes the code harder to understand. The call lable.graphemes would be even more natural.

apohllo · 2023-11-30T17:49:35Z

The same remark applies to the following code:

nameguard/api/nameguard/nameguard.py

Lines 146 to 150 in daafc6a

    
           labels_checks = [ 
        
               [check(label_analysis) for check in LABEL_CHECKS] 
        
               # checks have to handle labelhashes 
        
               for label_analysis in labels_analysis 
        
           ]

label would be better than label_analysis.

apohllo · 2023-11-30T18:01:44Z

Does the DNA checks require previous aggregation of results?

nameguard/api/nameguard/nameguard.py

Lines 170 to 176 in daafc6a

    
           for check_g, check_l, check_n in DNA_CHECKS: 
        
               for label_i, label_analysis in enumerate(labels_analysis): 
        
                   if label_analysis is not None: 
        
                       for grapheme_i, grapheme in enumerate(label_analysis.graphemes): 
        
                           labels_graphemes_checks[label_i][grapheme_i].append(check_g(grapheme)) 
        
                   labels_checks[label_i].append(check_l(label_analysis)) 
        
               name_checks.append(check_n(labels_analysis))

I don't understand why the application and organization of these checks differs from the other checks.
A short comment would be helpful.

apohllo · 2023-11-30T18:04:12Z

Taking into account that this a constructor:

nameguard/api/nameguard/nameguard.py

Lines 180 to 184 in daafc6a

    
           return NameGuardReport( 
        
               name=name, 
        
               namehash=namehash_from_name(name), 
        
               normalization=Normalization.UNKNOWN 
        
               if any(label_analysis is None for label_analysis in labels_analysis)

it's very strange the single call to that method occupies more than 60 lines.

Have you considered a fluent API, a factory pattern or preparation of the argumetns up-front?
And then it also would be possible to move the logic, that currently is in the long part of the code to the specific method related to the arguments of the constructor.

apohllo · 2023-12-01T14:41:23Z

The part of the code:

nameguard/api/nameguard/nameguard.py

Lines 184 to 188 in daafc6a

    
           if any(label_analysis is None for label_analysis in labels_analysis) 
        
           else Normalization.NORMALIZED 
        
           if all(label_analysis.status == 'normalized' and len(label_analysis.label) > 0 
        
                  for label_analysis in labels_analysis) 
        
           else Normalization.UNNORMALIZED,

Is rather hard to follow, since the values assigned to the param are intertwined with the logic responsible for their computation. A method computing the outcome would be much more readable.

apohllo · 2023-12-01T14:42:15Z

It would be better to move the logic here:

nameguard/api/nameguard/nameguard.py

Lines 193 to 199 in daafc6a

    
           canonical_name=compute_canonical_from_list( 
        
               [label_analysis.normalized_canonical_label 
        
                if label_analysis is not None 
        
                else labels[i] # labelhash 
        
                for i, label_analysis in enumerate(labels_analysis)], 
        
                sep='.', 
        
           ),

into separate method, accepting labels_analysis.

apohllo · 2023-12-01T14:43:36Z

The code

nameguard/api/nameguard/nameguard.py

Lines 206 to 210 in daafc6a

    
           normalization=Normalization.UNKNOWN 
        
           if label_analysis is None 
        
           else Normalization.NORMALIZED 
        
           if label_analysis.status == 'normalized' and len(label_analysis.label) > 0 
        
           else Normalization.UNNORMALIZED,

has the same interleaved logic/value computation, which is hard to follow.

apohllo · 2023-12-01T14:44:17Z

The same applies to this piece of code:

nameguard/api/nameguard/nameguard.py

Lines 218 to 221 in daafc6a

    
           normalization=GraphemeNormalization.NORMALIZED 
        
           if any(check.status == CheckStatus.PASS and check.check is Check.NORMALIZED 
        
                  for check in grapheme_checks) 
        
           else GraphemeNormalization.UNNORMALIZED,

apohllo · 2023-12-01T14:45:43Z

I understand that this is Python-style:

nameguard/api/nameguard/nameguard.py

Lines 235 to 239 in daafc6a

    
           for label, label_analysis, label_checks, label_graphemes_checks in zip( 
        
               labels, 
        
               labels_analysis, 
        
               labels_checks, 
        
               labels_graphemes_checks,

but putting the loop at the end once again makes the code harder to understand, since the variables are defined after they are used.

I would suggest a class/struct that would combine the individual pieces of information into one object.

apohllo · 2023-12-01T14:53:05Z

This code and many similar pieces of code could be simplified from

nameguard/api/nameguard/nameguard.py

Lines 296 to 298 in daafc6a

    
           ([self._inspect_confusable(c) 
        
            for c in grapheme_analysis.confusables_other] 
        
            if grapheme_analysis.confusables_other else []),

to

([self._inspect_confusable(c) for c in (grapheme_analysis.confusables_other or [])]

apohllo · 2023-12-01T14:55:42Z

Here

nameguard/api/nameguard/nameguard.py

Line 304 in daafc6a

    
           grapheme_checks = [check(grapheme) for check in GRAPHEME_CHECKS + [c[0] for c in DNA_CHECKS]]

grapheme relates to the grapheme analysis result. Would be better to make the names more consistent
Cf. this code:

nameguard/api/nameguard/nameguard.py

Lines 277 to 278 in daafc6a

    
           grapheme_analysis = label_analysis.graphemes[0] 
        
           grapheme_checks = [check(grapheme_analysis) for check in GRAPHEME_CHECKS + [c[0] for c in DNA_CHECKS]]

There's also some non-trivial logic shared by those methods.

apohllo · 2023-12-01T15:00:58Z

For me the name of the method:

nameguard/api/nameguard/nameguard.py

Line 321 in daafc6a

    
           async def secure_primary_name(self, address: str, network_name: str) -> SecurePrimaryNameResult:

implicates, that we are securing some name, but the logic is about checking if the name is secure.
validate_security, is_name_secure or some other name with a verb that refers to the proces would make the method more easy to infer it's semantics from.

apohllo · 2023-12-01T15:02:49Z

Would it be possible to rewrite this code:

nameguard/api/nameguard/nameguard.py

Lines 329 to 345 in daafc6a

    
           if domain is None: 
        
               status = SecurePrimaryNameStatus.NO_PRIMARY_NAME 
        
               impersonation_status = None 
        
           else: 
        
               nameguard_result = await self.inspect_name(network_name, domain) 
        
               result = ens_process(domain, do_normalize=True, do_beautify=True) 
        
               if result.normalized != domain: 
        
                   status = SecurePrimaryNameStatus.UNNORMALIZED 
        
                   impersonation_status = None 
        
               else: 
        
                   display_name = result.beautified 
        
                   status = SecurePrimaryNameStatus.NORMALIZED 
        
                   primary_name = domain 
        
                   impersonation_status = ImpersonationStatus.UNLIKELY if any(check.check == 'impersonation_risk' and check.status == CheckStatus.PASS for check in 
        
                       nameguard_result.checks) else ImpersonationStatus.POTENTIAL

in a way, that all check are applied step-by-step with the same style of coding?
It's pretty hard to understand when given status will be returned for the name.

apohllo · 2023-12-01T15:04:42Z

As far as I understand:

nameguard/api/nameguard/nameguard.py

Lines 370 to 377 in daafc6a

    
           if token_type not in ['ERC721', 'ERC1155'] and contract_address in ens_contract_adresses: 
        
               return FakeEthNameCheckResult(status=FakeEthNameCheckStatus.UNKNOWN_NFT, nameguard_result=None, investigated_fields=None) 
        
           if token_type == 'NOT_A_CONTRACT': 
        
               return FakeEthNameCheckResult(status=FakeEthNameCheckStatus.UNKNOWN_NFT, nameguard_result=None, investigated_fields=None) 
        
           elif token_type == 'NO_SUPPORTED_NFT_STANDARD': 
        
               return FakeEthNameCheckResult(status=FakeEthNameCheckStatus.UNKNOWN_NFT, nameguard_result=None, investigated_fields=None) 
        
           elif token_type not in ['ERC721', 'ERC1155']:  # Alchemy does not support other types 
        
               return FakeEthNameCheckResult(status=FakeEthNameCheckStatus.UNKNOWN_NFT, nameguard_result=None, investigated_fields=None)

all checks result in the same value being returned. Maybe using or would make to code simpler?

apohllo · 2023-12-01T15:05:51Z

As mentioned earlier

nameguard/api/nameguard/nameguard.py

Lines 385 to 389 in daafc6a

    
           try: 
        
               name = nested_get(res_json, keys) 
        
               investigated_fields['.'.join(keys)] = name 
        
           except KeyError: 
        
               pass

it would be better to change the logic not to rise a KeyError.

apohllo · 2023-12-01T15:07:16Z

Else is not needed

nameguard/api/nameguard/nameguard.py

Lines 409 to 413 in daafc6a

    
           if title is None:   
        
               return FakeEthNameCheckResult(status=FakeEthNameCheckStatus.UNKNOWN_NFT, nameguard_result=None, 
        
                                             investigated_fields=None) 
        
           else: 
        
               if is_labelhash_eth(title):

apohllo · 2023-12-01T15:08:38Z

Making nameguard_report and investigated_fields None by default would simplify a lot of code:

nameguard/api/nameguard/nameguard.py

Line 371 in daafc6a

    
           return FakeEthNameCheckResult(status=FakeEthNameCheckStatus.UNKNOWN_NFT, nameguard_result=None, investigated_fields=None)

apohllo · 2023-12-01T15:09:17Z

Else is not needed:

nameguard/api/nameguard/nameguard.py

Lines 420 to 423 in daafc6a

    
           if is_ens_normalized(title): 
        
               return FakeEthNameCheckResult(status=FakeEthNameCheckStatus.AUTHENTIC_ETH_NAME, 
        
                                             nameguard_result=report, investigated_fields=None) 
        
           else:

apohllo · 2023-12-01T15:10:24Z

Once again else is not needed since all previous paths result in a return and exit the method.

nameguard/api/nameguard/nameguard.py

Lines 426 to 427 in daafc6a

    
           else: 
        
               impersonating_fields = {}

apohllo changed the title ~~Testing issue~~ Code audit Nov 24, 2023

apohllo changed the title ~~Code audit~~ Code and repo audit Nov 24, 2023

Code and repo audit #160

Code and repo audit #160

Comments

apohllo commented Nov 24, 2023 • edited Loading

apohllo commented Nov 24, 2023

apohllo commented Nov 24, 2023

apohllo commented Nov 24, 2023

apohllo commented Nov 24, 2023

apohllo commented Nov 24, 2023

apohllo commented Nov 24, 2023

apohllo commented Nov 24, 2023 • edited Loading

apohllo commented Nov 24, 2023

apohllo commented Nov 24, 2023 • edited Loading

apohllo commented Nov 24, 2023 • edited Loading

apohllo commented Nov 24, 2023

apohllo commented Nov 30, 2023 • edited Loading

apohllo commented Nov 30, 2023

apohllo commented Nov 30, 2023 • edited Loading

apohllo commented Nov 30, 2023

apohllo commented Nov 30, 2023 • edited Loading

apohllo commented Nov 30, 2023 • edited Loading

apohllo commented Nov 30, 2023

apohllo commented Nov 30, 2023

apohllo commented Nov 30, 2023 • edited Loading

apohllo commented Nov 30, 2023

apohllo commented Nov 30, 2023

apohllo commented Nov 30, 2023 • edited Loading

apohllo commented Nov 30, 2023 • edited Loading

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023 • edited Loading

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023 • edited Loading

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023 • edited Loading

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023

apohllo commented Dec 1, 2023

apohllo commented Nov 24, 2023 •

edited

Loading

apohllo commented Nov 24, 2023 •

edited

Loading

apohllo commented Nov 24, 2023 •

edited

Loading

apohllo commented Nov 24, 2023 •

edited

Loading

apohllo commented Nov 30, 2023 •

edited

Loading

apohllo commented Nov 30, 2023 •

edited

Loading

apohllo commented Nov 30, 2023 •

edited

Loading

apohllo commented Nov 30, 2023 •

edited

Loading

apohllo commented Nov 30, 2023 •

edited

Loading

apohllo commented Nov 30, 2023 •

edited

Loading

apohllo commented Nov 30, 2023 •

edited

Loading

apohllo commented Dec 1, 2023 •

edited

Loading

apohllo commented Dec 1, 2023 •

edited

Loading

apohllo commented Dec 1, 2023 •

edited

Loading