Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Royal Mail tracking number recognition #21

Closed
gpg0 opened this issue Jan 6, 2016 · 19 comments
Closed

Add Royal Mail tracking number recognition #21

gpg0 opened this issue Jan 6, 2016 · 19 comments

Comments

@gpg0
Copy link

gpg0 commented Jan 6, 2016

Please could you add Royal Mail tracking number recognition? Thanks. You have the info here: http://www.royalmail.com/sites/default/files/COSS_Spec_03_-_Barcodes_and_Tracking_Numbers_v1_1.pdf

@adgaudio
Copy link

adgaudio commented Jun 6, 2017

Hi @jkeen and @pasku

I am not a Ruby developer, so forgive me if this is the inappropriate place to post.

I just created a port of some of this library for Java and added basic support for Royal Mail. Perhaps these links can help you:

RoyalMail.java
RoyalMailTest.java

@jkeen
Copy link
Owner

jkeen commented Jun 7, 2017

Those files were helpful, but I ran into a problem I'm not sure how to overcome currently.

Royal mail and USPS (13) use exactly the same format of [A-Z]{2}[0-9]{9}[A-Z]{2} and use the same check digit calculation. From my research it looks like the two services operate together, and once Royal Mail arrives in the US, it can be tracked via USPS.

The documentation on Royal Mail says

1. Royal mail reference number is a 13 digit one.
2. It contains two letters initially eg:BD.
3. And ends with the two letters “GB” if the package is sent from the Great Britain.
4. Middle digits are comprised of 9 numbers.
5. Your reference is the barcode number present on the package and looks like this (AA 0000 0000 0GB).
6. Don’t worry if your reference number doesn’t end with “GB”, it might have been sent from the other country, not from Great Britain.

So right now it thinks some of those test cases in @adgaudio's java files are USPS numbers because of the identical format. I could check for GB at the end to catch most of the cases, but the edge cases where Royal Mail was sent from a different country… not sure how to handle that right now.

@jkeen
Copy link
Owner

jkeen commented Jun 8, 2017

I figured this out ⬆️, gonna start assuming 13 digit USPS numbers end in US, and Royal Mail numbers end in GB

@adgaudio
Copy link

adgaudio commented Jun 9, 2017

Good catch! Sorry for my delay.

I guess this solution means that inbound international mail will not be assigned either courier (USPS or Royal Mail). But domestic mail and outbound international mail will be just fine. This seems good to me as it's the best we can do.


Somewhat tangentially, I was looking at USPS documentation for check digits, and they say that:

  • domestic mail may use their MOD 11 algorithm
  • domestic priority express mail only uses their MOD 10 algorithm

I think both reference a 13 character tracking number.

http://about.usps.com/publications/pub97/pub97_appj_020.htm
http://about.usps.com/publications/pub97/pub97_appj_021.htm

So should we also add another USPS13 detector that works with mod11 to differentiate between priority express and domestic mail? I'm not familiar with the way USPS structures tracking numbers outside of what I ported from your code.

@adgaudio
Copy link

adgaudio commented Jun 9, 2017

Actually, this is cool.

  • Royal mail uses MOD 11 for all mail (international and domestic).

If we can assume USPS13 always applies to domestic mail (tracking number ending in "US" and validating against the mod 10 or 11 algo), then I think we assume that all other 13 digit tracking numbers matching the mod 11 checksum are Royal Mail, right? In this case, we would mislabel international Royal Mail postage coming from US as a USPS domestic tracking number, but at least we would be able to detect Royal Mail's inbound international mail from other countries.

Does that make sense? If it does, I'm wondering how to proceed: Do you think it's worth being able to identify inbound international Royal Mail at the expense of a couple incorrectly labeled USPS tracking numbers, or is it better simply to not detect when Royal Mail is the courier for international mail going into GB?

@jkeen
Copy link
Owner

jkeen commented Jun 9, 2017

Royal mail uses MOD 11 for all mail (international and domestic).

Does Mod 11 always have weighting? I thought theirs used some weighting. For USPS13, I can't find the original spec (USPS doesn't seem to know about 301 redirects and the links are broken), but I'm not sure why the case statement below is necessary my existing checksum calculation:

  #mod 11 with 86423597 weighting

  def valid_checksum?
   sequence = tracking_number.scan(/[0-9]+/).flatten.join # all the numbers
   chars = sequence.chars.to_a  
   check_digit = chars.pop.to_i

   sum = 0
   chars.zip([8,6,4,2,3,5,9,7]).each do |pair|
     sum += (pair[0].to_i * pair[1].to_i)
   end

   remainder = sum % 11
# -> It must have said something about this in the documentation?
   check = case remainder
   when 1
     0
   when 0
     5
   else
     11 - remainder
   end

   return check == check_digit
 end

Other tracking numbers use a similar algorithm (which they also call Mod11) but without that.

I think the tricky part about the 13 digit tracking numbers is that it seems that they're a part of common spec across many shippers. For example, India Post also seems to use a 13 digit number, but ending in "IN" (and the examples continue—Thailand, same but ending with "TH"). So maybe the solution is to abstract it into some other class with a dynamic carrier based on the last two letters, or to lock down the USPS 13 digit class to require the number end with "US"? I'm not sure with all the possibilities if it's possible to detect anything else without relying on those last two letters.

@jkeen
Copy link
Owner

jkeen commented Jun 9, 2017

Do you have any example active tracking numbers? If you put an active Royal Mail tracking number in USPS tracking, does it resolve? Or vice versa.

@adgaudio
Copy link

I found a very interesting piece of information here:

which led me to these:

  1. https://en.wikipedia.org/wiki/S10_(UPU_standard)
  2. page 2 of http://www.upu.int/uploads/tx_sbdownloader/S10TechnicalStandard.pdf
    2b. (less useful) http://www.upu.int/nc/en/activities/standards/upu-technical-standards.html?sword_list[0]=standards

My thoughts based on (2), which is a very good source:

  • UN countries use the S10 standard, as defined by Universal Postal Union for international mail.
  • We should restrict USPS13 and Royal Mail based on the last 2 characters, as you suggested.
  • We can perhaps have a "catch-all" for 13 digit numbers that represents UPU_S10 international mail of unknown origin (until we figure out what to do with it). Since those last two characters, or "country code," conforms to "the two-character ISO 3166–1 code of the UPU member country under whose authority the S10 identifier was issued," (2) we could also consider making "[A-Z]{2}$" a bit more restrictive. We could also make the service code (first 2 letters) more restrictive.
  • if we follow instructions in (2) carefully, we could technically infer stuff from the service like whether the mail is insured. Not sure how into this we really want to get though.

Other stuff:

  • the case statement:
    My (untested) guess would be that the person who designed the algorithm realized that the remainder from (X%11) is not evenly distributed across the 10 available check digits (keep in mind 10 is not a valid check digit because it has two characters), so they did the best they could to even out the remainders over the available digits ([0-9]).

  • Do you have any example active tracking numbers? If you put an active Royal Mail tracking number in USPS tracking, does it resolve? Or vice versa.

I can get tracking numbers from all over, but I can't share them unless they are more than a year old (I recall S10 spec said after 1 year, you can re-use tracking number). Since we have a recognized international spec, though, I bet collecting numbers won't be necessary anymore since we can generate them as we wish.

@jkeen
Copy link
Owner

jkeen commented Jun 11, 2017

Alex! This is awesome! What a find!

I think I'll make some changes to have a base class for the S10 standard, and then subclass them for RoyalMail and USPS with the end letter qualification. We could reference that ISO country standard and make a good guess with unknown tracking numbers that follow that standard, too.

Inferring all we can out of those numbers would be great. I started doing that with other ones with the decode methods, but lost steam on it a while ago. I need to firm up that API for what each tracking number should ideally return.

@jkeen
Copy link
Owner

jkeen commented Jun 12, 2017

I found this, and scraped the pages to get all the info on each country. I've got a Generic S10 class that reads off the last two letters and sees if there's a matching key in this file (which you can use, also).

@adgaudio
Copy link

That's great! If we were to assume that both of us read off of that JSON file, should we put it in a shared repository somewhere? I guess I would need to store a copy of it in the java library (MysteryTrackingNumber) and occasionally synchronize the files? Short of storing copies of the code, I'm not sure what the best way is to have multiple libraries using this
json file and future ones like it.

If we created a space for it (ie directory in your repo), do we create scripts like "sync_s10.sh" that all the libraries can use? If we do that, maybe there should be a dedicated repository containing only json files and a fetch script that language-specific libraries can use. The fetch script, as used by libraries, could download the json files into a dedicated default directory titled "upstream_files_do_not_modify" or something.


personal note: I will have some more time on Tues/Wed to dig into code and build something.

@adgaudio
Copy link

Also, I was checking out your changes - do you think this line should go into the s10.json file?

https://github.com/jkeen/tracking_number/compare/feature/royal_mail#diff-54f7e2c2bc3ef88a7868b9299522545cR10

And maybe we should remove the "contribution units" from that file - are they useful?

@jkeen
Copy link
Owner

jkeen commented Jun 15, 2017

I think we could make another repo with some shared json files in it, version it, and then make it a git submodule. That way we can skip the scripts. Each implementation can put the files wherever they want. I've done other projects with submodules, and sometimes they're frowned upon or people think they're a pain, but this seems like a perfect use case.

If we want to sync up on names, data seems like a good folder name. Not sure how best to define the json file we were talking about in #26, but I think it'd be easy for it to get unruly and complicated, so we'll have to be careful.

Yeah, contribution units could go. It was basically just an indicator of how developed the country was, and before I decided to support all of them (and went through editing keys) I was only going to pick the most used/developed countries to include (>25 contribution units).

I'm about to drop off the map for a couple of weeks while vacationing, but I'll be back and at it in July.

@adgaudio
Copy link

Oh submodules are the way to go! I also like using them, despite their trickiness. This is a good use case.

Have a great vacation. If you don't create the repo before leaving (I'd like write permission to it please), I will create one and add you. I'll try to figure out if we can reduce the non-S10 couriers to simple JSON.

@jkeen
Copy link
Owner

jkeen commented Jun 15, 2017

Thanks. Created and invited to the creatively named tracking_number_data . I was trying to think of something cleverer, but I feel kinda locked in with my 2010 underscore choice on tracking_number and went for consistency. Let's see how it shakes out, hope you figure some good stuff out in the next couple of weeks.

@adgaudio
Copy link

Perfect - thanks :)

Have a good trip!

@adgaudio
Copy link

Note for future reference: this issue can be closed when jkeen/tracking_number_data#1 is integrated into this repo.

@jkeen
Copy link
Owner

jkeen commented Aug 3, 2017

This can be closed with #28, which supports S10 tracking numbers, which includes Royal Mail

@jkeen
Copy link
Owner

jkeen commented Jan 31, 2018

I just released 1.0.0.pre1 which resolves this issue @pasku. Final release should be coming in the next day or so

@jkeen jkeen closed this as completed Jan 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants