Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix calendar generating start_date > end_date #78

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

miklcct
Copy link
Contributor

@miklcct miklcct commented Nov 23, 2023

This prevents any calendar entries with start_date > end_date from leaking out by preventing them from being returned at the first instance.

fixes #76

@linusnorton
Copy link
Collaborator

Thank you for taking the time to write a patch. I don't have the latest data to hand - do you know why the end date before the start date? Given it's on the associated schedule (W27693_W28273) and not the cloned journeys I think the suspect logic would be:

calendar.clone(
  moment.max(this.calendar.runsFrom, calendar.runsFrom),
  moment.min(this.calendar.runsTo, calendar.runsTo)
),

I can't remember why we would use the calendar from the assoc train service (calendar) rather than the association itself. Logically I would say that the new service will only run for the duration of the association.

Does changing that code in mergeSchedules to this.calendar fix the issue?

@miklcct
Copy link
Contributor Author

miklcct commented Nov 23, 2023

The original clone function accidentally mutates the parameters passed in (hence I have added two lines of .clone()) causing an invalid schedule, which I spent hours of debugging as the bugged schedule suddenly appeared and it didn't exist at the generation source.

After I added the two lines of .clone(), the original bug disappeared but a few pairs of [date+1, date] invalid ranges appeared.

The following is the raw data:

AANW27693W282732311132312011111100VVSCRDFCEN  TP                               P
AANW27693W282732311212311240111100   CRDFCEN  T                                C
AANW27693W282732311272312011111100   CRDFCEN  T                                C
BSNW276932311132312011111100 POO2M04    125442000 DMUS   075      S            P
BSNW282732311132312011111100 POO2V02    125447000 DMUS   075      S            P
BSNW282732311142311240111100 POO2V02    125447000 DMUS   075      S            O
BSNW276932311212311240111100 POO2M04    125442000 DMUS   075      S            O
BSNW276932311272312011111100 POO2M04    125442000 DMUS   075      S            O
BSNW282732311272312011111100 POO2V02    125447000 DMUS   075      S            O
BSNW276932312042312081111100 POO2M04    125442000 DMUS   075      S            P
BSNW282732312042312081111100 POO2V02    125447000 DMUS   075      S            P

and the CSV exported from the database

"id","train_uid","runs_from","runs_to","monday","tuesday","wednesday","thursday","friday","saturday","sunday","bank_holiday_running","train_status","train_category","train_identity","headcode","course_indicator","profit_center","business_sector","power_type","timing_load","speed","operating_chars","train_class","sleepers","reservations","connect_indicator","catering_code","service_branding","stp_indicator"
"60145","W27693","2023-11-13","2023-12-01","1","1","1","1","1","0","0","1","P","OO","2M04",NULL,"1","25442000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"P"
"63140","W27693","2023-11-21","2023-11-24","0","1","1","1","1","0","0","1","P","OO","2M04",NULL,"1","25442000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"O"
"84553","W27693","2023-11-27","2023-12-01","1","1","1","1","1","0","0","1","P","OO","2M04",NULL,"1","25442000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"O"
"101522","W27693","2023-12-04","2023-12-08","1","1","1","1","1","0","0","1","P","OO","2M04",NULL,"1","25442000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"P"
"60146","W28273","2023-11-13","2023-12-01","1","1","1","1","1","0","0","1","P","OO","2V02",NULL,"1","25447000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"P"
"60162","W28273","2023-11-14","2023-11-24","0","1","1","1","1","0","0","1","P","OO","2V02",NULL,"1","25447000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"O"
"84582","W28273","2023-11-27","2023-12-01","1","1","1","1","1","0","0","1","P","OO","2V02",NULL,"1","25447000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"O"
"101523","W28273","2023-12-04","2023-12-08","1","1","1","1","1","0","0","1","P","OO","2V02",NULL,"1","25447000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"P"
"id","base_uid","assoc_uid","start_date","end_date","monday","tuesday","wednesday","thursday","friday","saturday","sunday","assoc_cat","assoc_date_ind","assoc_location","base_location_suffix","assoc_location_suffix","association_type","stp_indicator"
"428","W27693","W28273","2023-11-13","2023-12-01","1","1","1","1","1","0","0","VV","S","CRDFCEN",,,"P","P"
"441","W27693","W28273","2023-11-21","2023-11-24","0","1","1","1","1","0","0",NULL,NULL,"CRDFCEN",,,NULL,"C"
"694","W27693","W28273","2023-11-27","2023-12-01","1","1","1","1","1","0","0",NULL,NULL,"CRDFCEN",,,NULL,"C"

@linusnorton
Copy link
Collaborator

Thanks, I'm just wondering if it's possible to prevent the null returns in order to simplify the code

@linusnorton
Copy link
Collaborator

Matching up the associations and services with the overlays makes it look like the start date for W28273 is wrong:

AAN W27693 W28273 231113 231201 1111100VVSCRDFCEN  TP                               P
BSN W27693        231113 231201 1111100 POO2M04    125442000 DMUS   075      S            P
BSN W28273        231113 231201 1111100 POO2V02    125447000 DMUS   075      S            P

AAN W27693 W28273 231127 231201 1111100   CRDFCEN  T                                C
BSN W27693        231127 231201 1111100 POO2M04    125442000 DMUS   075      S            O
BSN W28273        231127 231201 1111100 POO2V02    125447000 DMUS   075      S            O

AAN W27693 W28273 231121 231124 0111100   CRDFCEN  T                                C
BSN W27693        231121 231124 0111100 POO2M04    125442000 DMUS   075      S            O
BSN W28273        **231114** 231124 0111100 POO2V02    125447000 DMUS   075      S            O

At that point the min/max should kick in and the max of the start date should use the AA record start date and the start and end should come out as 231124. That would suggest the merged record is fine but then we will create another service for before the association:

    if (assoc.calendar.runsFrom.isBefore(assocCalendar.runsFrom)) {
      const before = assoc.calendar.clone(assoc.calendar.runsFrom, assocCalendar.runsFrom.clone().subtract(1, "days"));

      schedules.push(assoc.clone(before, idGenerator.next().value));
    }

I regret the naming of assoc and assocCalendar. assoc.calendar is the the assoc service and assocCalendar is assoc record itself.

The new before service should run from 231114 to 231121 - 1 day so I'm not quite sure how it still goes wrong.

Sorry for thinking out loud. I haven't touched this code in years and the "correct" behaviour is not always obvious.

@linusnorton
Copy link
Collaborator

@miklcct I think they have corrected the data this morning so that the runs_from field for:

BSN W28273        **231114** 231124 0111100 POO2V02    125447000 DMUS   075      S            O

Now matches the association record.

I'm inclined to merge the fix for the mutation bug you found but not the changes that allow null schedules.

@miklcct
Copy link
Contributor Author

miklcct commented Nov 24, 2023

If I only fix the mutation bug it generates a GTFS with a few calendar ranges between Christmas and New Year with start_date being 1 day after the end_date, so I believe returning null from clone() is necessary, or disable the calendar tightening altogether.

@linusnorton
Copy link
Collaborator

I did a quick check with today's data and I couldn't find any cases where the start date was after the end date. What records are you seeing?

@miklcct
Copy link
Contributor Author

miklcct commented Nov 24, 2023

I have just run the generator on the latest master branch with version 930 data, with only the two lines of .clone() added to fix the mutation issue.

Errors are generated as follows:
filename (?)The name of the faulty file. csvRowNumber (?)The row number of the faulty record. entityId (?)The faulty service id. startFieldName (?)The start value's field name. startValue (?)The start value. endFieldName (?)The end value's field name. endValue (?)The end value.
"calendar.txt" 156 "155" "start_date" "20231203" "end_date" "20231126"
"calendar.txt" 493 "492" "start_date" "20231208" "end_date" "20231207"
"calendar.txt" 539 "538" "start_date" "20231127" "end_date" "20231124"
"calendar.txt" 596 "595" "start_date" "20231204" "end_date" "20231201"
"calendar.txt" 886 "885" "start_date" "20231130" "end_date" "20231129"
"calendar.txt" 887 "886" "start_date" "20231130" "end_date" "20231124"
"calendar.txt" 888 "887" "start_date" "20231129" "end_date" "20231128"
"calendar.txt" 1037 "1036" "start_date" "20231206" "end_date" "20231129"
"calendar.txt" 1038 "1037" "start_date" "20231205" "end_date" "20231128"
"calendar.txt" 1090 "1089" "start_date" "20231209" "end_date" "20231202"
"calendar.txt" 1131 "1130" "start_date" "20231202" "end_date" "20231125"
"calendar.txt" 3230 "3229" "start_date" "20231214" "end_date" "20231213"
"calendar.txt" 3520 "3519" "start_date" "20240105" "end_date" "20231229"
"calendar.txt" 4531 "4530" "start_date" "20240126" "end_date" "20240119"
"calendar.txt" 4769 "4768" "start_date" "20240127" "end_date" "20240120"
"calendar.txt" 4801 "4800" "start_date" "20240120" "end_date" "20240113"
"calendar.txt" 4802 "4801" "start_date" "20240106" "end_date" "20231230"

The full report can be viewed here.

@linusnorton
Copy link
Collaborator

Ah it looks like your on the weekly feed. I'm on the daily which is 932

@linusnorton
Copy link
Collaborator

RJTTC931.ZIP
RJTTC932.ZIP

Try processing these after the 930

@miklcct
Copy link
Contributor Author

miklcct commented Nov 24, 2023

Error still exist on version 932:
https://gtfs-validator-results.mobilitydata.org/9f2b4ecb-81c5-4ac6-9ee4-974989ed4507/report.html

filename (?)The name of the faulty file. csvRowNumber (?)The row number of the faulty record. entityId (?)The faulty service id. startFieldName (?)The start value's field name. startValue (?)The start value. endFieldName (?)The end value's field name. endValue (?)The end value.
"calendar.txt" 163 "162" "start_date" "20231203" "end_date" "20231126"
"calendar.txt" 240 "239" "start_date" "20231207" "end_date" "20231206"
"calendar.txt" 632 "631" "start_date" "20231127" "end_date" "20231124"
"calendar.txt" 640 "639" "start_date" "20231205" "end_date" "20231204"
"calendar.txt" 690 "689" "start_date" "20231204" "end_date" "20231201"
"calendar.txt" 696 "695" "start_date" "20231208" "end_date" "20231201"
"calendar.txt" 849 "848" "start_date" "20231208" "end_date" "20231207"
"calendar.txt" 1015 "1014" "start_date" "20231204" "end_date" "20231130"
"calendar.txt" 1034 "1033" "start_date" "20231130" "end_date" "20231129"
"calendar.txt" 1035 "1034" "start_date" "20231130" "end_date" "20231124"

@linusnorton
Copy link
Collaborator

I get a completely different set of data
gb-rail.zip

@linusnorton
Copy link
Collaborator

This is my 930: https://file.io/yK4ODhtZDv6b - is it the same as yours?

@miklcct
Copy link
Contributor Author

miklcct commented Nov 24, 2023

Yes the files are the same. Have you added the two lines of .clone() in ScheduleCalendar.ts lines 113 and 114?

@linusnorton
Copy link
Collaborator

No, just running from master. I'll try again with the clone methods

@linusnorton
Copy link
Collaborator

That's so weird! Adding the clones, which I would expect to be fine then gives me the same results as you. I'll debug a bit further. Sorry this is taking so long.

miklcct pushed a commit to Jnction/dtd2mysql that referenced this pull request Jan 10, 2024
An upstream bug causing pipeline failures

[PR submitted to upstream](planarnetwork#78)

Related work items: #951
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GTFS produced by timetable revision 930 contains invalid calendar
2 participants