Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSP June 4th 2019 .mpp file format breaking changes #107

Closed
ndarlington opened this issue Jun 26, 2019 · 20 comments
Closed

MSP June 4th 2019 .mpp file format breaking changes #107

ndarlington opened this issue Jun 26, 2019 · 20 comments

Comments

@ndarlington
Copy link

ndarlington commented Jun 26, 2019

Saving a project with the versions of MSP that contain the June 4th fixes compared to the same release lines of MSP that precede it appear to have enough changes to the task data and meta data records to be causing some significant data corruption.

Reference:
https://support.microsoft.com/en-us/help/4464589/june-4-2019-update-for-project-2016-kb4464589

Although this was an update from a few weeks ago, I believe some of the 'monthly' distribution and update channels from Microsoft are just beginning to push this out to desktops as of the last few days.

  • We are still to do testing to see the full scope of this, we know it affects Tasks strongly, it doesn't appear to affect Calendar data, but more detailed looks into record/data types like Resources, Dependencies, and Assignments are TBD.
  • Some of the corruption findings of reading the .mpp file and writing it back out as .xml included:
    • All or most tasks that were active (1) were coming out as inactive (0) in the xml file.
    • Many summary tasks were having their summary element set to 0 in the xml file.
    • Some tasks appeared to get duplicated (same 'ID' or 'MSP row number' but different 'UID' values). This also has a knock-on effect in skewing the OutlineLevel numbers to accommodate the dupes.
    • We found some custom fields (e.g. "Text3") could be misaligned to completely different tasks (usually the weird dupes); very bad news for us as these are used to hold unique references to external systems for updates, but it wasn't always the case so a bit hit and miss.
    • The misaligned reading of the task records seemed to cause a large number of the 'Enterprise Flag' fields to be spontaneously added to the xml file that don't exist at all in the same .mpp. These fields only ever exported with the flag field set to 1 and not 0, and which fields they were would vary from task to task, it wasn't all that consistent (could be between 0 to 7 flag fields per task).

These new-version MSP (June+) .mpp files can still be opened perfectly fine by the prior versions (e.g. March release of MSP) and without these issues present, but I guess that's part of the nature of how it reads the OLEDoc format that's different to the expectations that the MPP reader class has.

I plan to get and upload copies of the side-by-side .mpp files from March (good) and June (bad) releases of MSP if that helps, along with their converted-to-xml counterparts.

@ndarlington
Copy link
Author

I'm sorry to say that my understanding of how the data is read from the .mpp files is too primitive for me to fully understand what's going on, although I did see that the bulk of the misaligned data seemed to be appearing in what you would call the taskFixed2Data and to a (possibly lesser) extent taskFixedData before it.

For example, taskFixed2Data would often be populated with a number of byte arrays around the 64 bytes mark in size. With the new version of MSP, some of those byte arrays would balloon up to 260k bytes per row, whether they were just filled with 0's or not.

Wish I could be more help in pinpointing the fix or patch.

One other detail I forgot to include in my original posting: the first we noticed there was even a problem actually came about because we were getting OutOfMemory exceptions for project plans that were processing fine before hand.

For example, with the 'read mpp, save as xml' conversion with the mpp generated by the March release of MSP, I could set Java VM flag -Xmx256m and it was sufficient headroom to process it (the test project had 10k tasks in it, but they're simple/basic).

I had to increase that to -Xmx850m for the same project plan and data to process without error in the June update release of MSP version of the mpp file.

Unfortunately we can't just tweak those numbers where we're actually using this because it's in C# and for 32-bit addins for MSP, so we hit the 2GB 32-bit process ceiling (even when MSP is only pegging around 700-800MB in task manager) now that the data corruption is causing the memory to shoot up 3 times or more.

@ndarlington
Copy link
Author

This, very simply, is the Java code I used to open the mpp and save the xml so that I could experiment more with the issues we were facing on the C# side:

package com.regoconsulting.mpxj;

import net.sf.mpxj.MPXJException;
import net.sf.mpxj.ProjectFile;
import net.sf.mpxj.TaskField;
import net.sf.mpxj.mspdi.MSPDIWriter;
import net.sf.mpxj.reader.UniversalProjectReader;

import java.io.IOException;

public class Main {

    public static void main(String[] args) throws MPXJException, IOException {
        if (args.length > 0)
            ConvertUsingMpxj(args[0].startsWith("com.rego") ? args[1] : args[0]);
    }

    private static void ConvertUsingMpxj(String filename) throws MPXJException, IOException {
        UniversalProjectReader reader = new UniversalProjectReader();
        ProjectFile project = reader.read(filename);

        System.out.println(filename + " has lookup table:\n" + project.getCustomFields().getCustomField(TaskField.TEXT1).getLookupTable().toString());

        MSPDIWriter writer = new MSPDIWriter();
//        writer.setWriteTimephasedData(true);
        writer.write(project, filename + "-converted.xml");

        System.out.println("Finished");
    }
}

You can also ignore the lookup table println() statement; it was there from some prior inspection work, and not significant for this issue.

@joniles
Copy link
Owner

joniles commented Jun 26, 2019

Could you add a couple of sample files so I can take a look? Thanks!

@ndarlington
Copy link
Author

corrupted_projects_issue_107.zip

Absolutely, so these are some projects from our tester. The two *new.mpp files are what they saved from the latest MSP version update.

I then opened those in my version of MSP and saved them back out as *previous.mpp (as I know that uses the prior mpp format/layout).

I thought that should give a nice apples to apples compare.

I then ran all 4 projects (2 new, 2 previous) through the above conversion to XML.

All the things I reported above still held true so hopefully you'll see that - doing comparisons of the XML files in WinMerge wasn't too painful (notepad++ would choke on the bigger files though).

I also noted in the smaller project that a resource calendar was missing from the XML, even though it didn't really have any actual data to lose anyway - but still; I might need to get a better example of that to check with.

@ndarlington
Copy link
Author

I also noted in the smaller project that a resource calendar was missing from the XML, even though it didn't really have any actual data to lose anyway - but still; I might need to get a better example of that to check with.

Turns out that was just the internal/hidden (UID 0) resource's calendar entry that was missing - not something that affects us personally, and it looks like the other base/resource calendar data was intact and OK as far as I could see.

So still just all task-level issues then.

@joniles
Copy link
Owner

joniles commented Jun 26, 2019

Thanks for the files, I'll take a look as soon as I can. Were the sample files produced with MS Project 2019+current updates or MS Project 2016+current updates?

@ndarlington
Copy link
Author

Sort of neither - Office online with Click to Run (they're not differentiating between 2016 and 2019 anymore for these as far as I know, only for the desktop install versions).

We know it was the 'June' update for them all that breaks things though, as we had been waiting (ironically) for one of the fixes in it, and whilst it seems to have done so, it also caused this issue in return.

This is the version that broke things and produced the '*new.mpp' files:
msp_bad_version

Here's the one we'd been using for months prior that is (still) fine and was used to produce the '*previous.mpp' files:

msp_good_version

If I can provide anything else, I'll be happy to.

@ndarlington
Copy link
Author

We would expect that 2016 + current updates (including the one in the initial issue description) will have the fault for the desktop install versions of MSP.

@joniles
Copy link
Owner

joniles commented Jun 27, 2019

I've fixed the issue, the updated code is now in Git. I'll probably release a new version of MPXJ today.

@joniles joniles closed this as completed Jun 27, 2019
@ndarlington
Copy link
Author

Thanks again Jon, we've been able to upgrade to this version and conduct some testing on both good/'bad' versions of MSP and it's been working like a champ.

I only wish I could have fixed it and saved you the bother :)

@akhilnaruto
Copy link

akhilnaruto commented Sep 6, 2019

@joniles / @ndarlington
do we have any workaround for this issue, i am on older version of mpxj, so wondering if any workaround is available ?

Thanks in advance for the help

@joniles
Copy link
Owner

joniles commented Sep 6, 2019

@akhilnaruto unfortunately the only workaround would be to use "save as" to save your MS Project schedule to an older version of the file format, which will probably resolve the issue. A code change in MPXJ is required otherwise. Not sure if you're stuck on an old version for a particular reason, backporting the small change might be one approach you could take if you need everything else to stay the same. Easiest just to upgrade though!

@akhilnaruto
Copy link

@joniles ,
Thank you so much for quick response.

@akhilnaruto
Copy link

akhilnaruto commented Sep 6, 2019

@joniles can you please confirm if below is the only commit to fix this issue ?
628df2b

i applied above commit in my local and seems this resolved the issue, wanted to double check with you,

also can you please help me understand, what caused this issue ? looking through the commit, i couldnt figure out this fix

again thank you so much for the help

@joniles
Copy link
Owner

joniles commented Sep 6, 2019

@akhilnaruto that's correct, that is the commit which fixes the issue.

The cause of the issue is that we don't really fully understand how a lot of the MPP file format works, so there are a number of heuristics used to make a best guess as to what the format is. In this case the list of numbers at the end of the constructor argument is a list of possible block sizes. We examine the data we have and work out which one of these block sizes best fits with the length of data we have. It looks like the most recent version of MS Project changes the block size.

One day we might work out where in the file this block size is defined, or the algorithm for determining it... but for the moment the best we have is heuristics.

@akhilnaruto
Copy link

@joniles
Thank you so much.

@joniles
Copy link
Owner

joniles commented Sep 6, 2019

@akhilnaruto
Copy link

@joniles ,
Yes,I posted this question yesterday, can i mark as answered and have a link of this issue, so that some one can find if required ?

@joniles
Copy link
Owner

joniles commented Sep 6, 2019

@akhilnaruto I've added an answer to the question pointing back to this conversation!

@akhilnaruto
Copy link

@joniles Thank you so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants