Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(create-expo): avoid corrupting .tar.gz files in templates #26741

Merged

Conversation

shirakaba
Copy link
Contributor

@shirakaba shirakaba commented Jan 27, 2024

Why

tar.extract() (from node-tar, used by create-expo here), by default, parses files from within a .tar.gz as UTF-8. This is generally incorrect for binary files inside the tarball. I found, when creating an Expo template containing a .tar.gz file (or equally a .zip file) inside, that create-expo would successfully unpack most of the template, but corrupt the inner .tar.gz file. It affects both cloning from GitHub and from a local tarball.

I've provisionally applied this minimal fix to create-expo, and also to @expo/cli (as it seems the createFileTransform() implementation is mirrored there). We could of course cover more types like .zip, but as the scope is basically "all possible binary files", I think the list is endless.

Importance: I can give Expo superpowers if this PR lands 👀

How

# First, duplicate expo-template-bare-minimum.
# Installing it from npm is a convenient way to do so.
mkdir -vp workdir/expo-template-custom
cd workdir
npm install expo-template-bare-minimum
mv node_modules/expo-template-bare-minimum expo-template-custom

# Now, add any .tar.gz file to it.
# Let's say it's called example.tar.gz and is 70 MB in size.
mv path/to/example.tar.gz expo-template-custom
cd expo-template-custom
npm pack
cd ..

# Next, install the CLI locally.
npm install create-expo-app@latest

# Now make the patch to node_modules/create-expo-app/build/index.js.

# Create a new project using your local Expo template, via the forked CLI
./node_modules/.bin/create-expo-app --template ./expo-template-custom/expo-template-custom-50.0.33.tgz verificationapp

# Look inside the created app and check whether the size of example.tar.gz is still 70 MB.
# Without this patch, it was ballooning to 140 MB, so the difference is easy to spot.
ls -la verificationapp

Test Plan

Local test plan covered in the "How" section above. I think there is no risk to this change as none of the templates include .tar.gz files currently, and all it does is fix something that's broken to begin with.

Checklist

I think I'll check the same choices as Cedric did in this similar PR.

@expo-bot expo-bot added the bot: suggestions ExpoBot has some suggestions label Jan 27, 2024
tar.extract(), by default, parses files from within a .tar.gz as UTF-8. This is generally incorrect for binary files. I found, when creating an Expo template involving a .tar.gz file (or equally a .zip file) inside, that create-expo would successfully unpack most of the template, but corrupt the inner .tar.gz file.

I've applied this minimal fix to create-expo, and also to @expo/cli (as it seems the createFileTransform() implementation is mirrored there).
@shirakaba shirakaba force-pushed the @shirakaba/fix-create-expo-gzip-handling branch from 4df58f9 to 96a3ac7 Compare January 27, 2024 13:42
@expo-bot expo-bot added bot: passed checks ExpoBot has nothing to complain about and removed bot: suggestions ExpoBot has some suggestions labels Jan 27, 2024
@shirakaba
Copy link
Contributor Author

shirakaba commented Jan 28, 2024

Follow-up: This certainly fixes one problem, but it caught me again in another case (this time with a file other than .tar.gz). I tried committing an .xcframework to the template and it corrupted all the binaries within. The binaries have no file extensions, so it's even harder to safely limit this change to just them.

Really starting to think the tar npm package is forcing compromises here (compared to decompressing on the CLI). Will continue to think about options. For now, assuming this PR can be merged, I'll tar any files that contain binaries, and untar them on postinstall, but it's a real pain so I'd like to discuss ways forward.

Copy link
Member

@byCedric byCedric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @shirakaba ❤️ Let's keep chatting about a possible better fix to account for this scenario.

packages/@expo/cli/CHANGELOG.md Outdated Show resolved Hide resolved
packages/create-expo/CHANGELOG.md Outdated Show resolved Hide resolved
@byCedric byCedric merged commit f0bfc4e into expo:main Jan 30, 2024
8 of 11 checks passed
byCedric added a commit that referenced this pull request Feb 1, 2024
# Why

`tar.extract()` (from [node-tar](https://github.com/isaacs/node-tar),
used by `create-expo`
[here](https://github.com/expo/expo/blob/8d29bbba0ccd4ecb35bac0df7b512ee129d3cde8/packages/create-expo/src/Examples.ts#L87)),
by default, parses files from within a `.tar.gz` as UTF-8. This is
generally incorrect for binary files inside the tarball. I found, when
creating an Expo template containing a `.tar.gz` file (or equally a
`.zip` file) inside, that `create-expo` would successfully unpack most
of the template, but corrupt the inner `.tar.gz` file. It affects both
cloning from GitHub and from a local tarball.

I've provisionally applied this minimal fix to `create-expo`, and also
to `@expo/cli` (as it seems the `createFileTransform()` implementation
is mirrored there). We could of course cover more types like `.zip`, but
as the scope is basically "all possible binary files", I think the list
is endless.

Importance: I can give Expo superpowers if this PR lands 👀

# How

```sh
# First, duplicate expo-template-bare-minimum.
# Installing it from npm is a convenient way to do so.
mkdir -vp workdir/expo-template-custom
cd workdir
npm install expo-template-bare-minimum
mv node_modules/expo-template-bare-minimum expo-template-custom

# Now, add any .tar.gz file to it.
# Let's say it's called example.tar.gz and is 70 MB in size.
mv path/to/example.tar.gz expo-template-custom
cd expo-template-custom
npm pack
cd ..

# Next, install the CLI locally.
npm install create-expo-app@latest

# Now make the patch to node_modules/create-expo-app/build/index.js.

# Create a new project using your local Expo template, via the forked CLI
./node_modules/.bin/create-expo-app --template ./expo-template-custom/expo-template-custom-50.0.33.tgz verificationapp

# Look inside the created app and check whether the size of example.tar.gz is still 70 MB.
# Without this patch, it was ballooning to 140 MB, so the difference is easy to spot.
ls -la verificationapp
```

# Test Plan

Local test plan covered in the "How" section above. I think there is no
risk to this change as none of the templates include `.tar.gz` files
currently, and all it does is fix something that's broken to begin with.

# Checklist

I think I'll check the same choices as Cedric did in this [similar
PR](#26639).

- [x] Documentation is up to date to reflect these changes (eg:
https://docs.expo.dev and README.md).
- [ ] Conforms with the [Documentation Writing Style
Guide](https://github.com/expo/expo/blob/main/guides/Expo%20Documentation%20Writing%20Style%20Guide.md)
- [ ] This diff will work correctly for `npx expo prebuild` & EAS Build
(eg: updated a module plugin).

---------

Co-authored-by: Jamie Birch <jamie@MacBook-Pro.local>
Co-authored-by: Cedric van Putten <github@cedric.dev>
@brentvatne brentvatne added the published Changes from the PR have been published to npm label Feb 6, 2024
byCedric added a commit that referenced this pull request Apr 10, 2024
…plates (#27212)

# Why

Continuing on from #26741 where I
merely kicked the can down the road, here I implement a proper lasting
solution.

As explained in the previous PR, both `create-expo` and `@expo/cli`
corrupt certain files when extracting templates due to how they rename
"HelloWorld" to the user's specified app name during extraction.
Specifically, this seems to cause binary files (e.g. executables) to
double in size, presumably due to an accidental encoding change.

While select file types are spared from this process [based on their
file
extension](https://github.com/expo/expo/blob/ce141b51fb8aaaa8e11038dd997ab6a3f22f2429/packages/create-expo/src/createFileTransform.ts#L60-L73),
it's not sustainable to guess every possible file extension that should
be exempt (we're missing `.zip`, for one thing), and overlooks files
that have no extension at all (like binaries).

So I thought that maybe we've been doing it the wrong way round. Instead
of renaming *all but some* files, we should **explicitly select which
ones get renamed**. This would be configurable, of course – we'd have
sensible defaults that would avoid most devs ever having to think about
the process, but we'd also support templates including their own config
to handle any exotic cases.

Importance: this allows us to safely commit to templates the following
files:
* prebuilt binaries like `.xcframework`s
* arbitrary compressed files like `.zip`, for reducing download times
* more media types (all kinds of videos)
* files that you actually want to write "HelloWorld" into, for some
reason

… It also finally makes the renaming procedure atomic and easily
testable (hence the many new tests).

# How

<!--
How did you build this feature or fix this bug and why?
-->

The feature is implemented in both `@expo/cli` and `create-expo`:

* I removed the file contents rewriting process from the tar extractor
(see changes to `createFileTransform.ts`).
* I kept the file path rewriting process of the tar extractor, however,
as that seemed to do more good than harm (but yes, maybe deserves a
followup).
* I introduced two post-extraction steps:
1. `getTemplateFilesToRenameAsync()`, which accepts a "rename config",
or otherwise falls back to the CLI's default (which hopefully covers
most cases in existence anyway) to determine which files should be
renamed
2. `renameTemplateAppNameAsync()`, which, for a given list of files,
renames “HelloWorld” to the desired app name with appropriate
sanitsation
* I added unit tests and E2E tests for each of these functions
* I worked out a default rename config (search for
`defaultRenameConfig`) that seems to cover all template cases in the
monorepo, though I may have possibly overlooked some cases in templates
outside the monorepo.

## About the rename config

See the documentation in the updated `packages/create-expo/README.md`.

# Test Plan

<!--
Please describe how you tested this change and how a reviewer could
reproduce your test, especially if this PR does not include automated
tests! If possible, please also provide terminal output and/or
screenshots demonstrating your test/reproduction.
-->

I've added lots of automated tests to give proper confidence, but manual
testing can be performed by building Create Expo and using them:

* Check out the branch of interest (either `main` or this PR,
`@shirakaba/extract-binaries-safely`)
* Build Create Expo and create a template with it
  ```sh
  # cd to wherever you keep your Expo monorepo
  cd ~/expo
  
  # Build create-expo
  cd packages/create-expo
  yarn build
  
  # cd to some other work directory. We'll build a template
  # in there.
  cd ~/Downloads
  
  ~/expo/packages/create-expo/build/index.js
  # Make an app called "ByeWorld", then move it into a
  # `new` or `old` directory for easy comparison later
  ```
* Having generated a template using a build of Create Expo derived from
both `main` and this PR's branch, you can finally compare the created
templates to assess that there have been no regressions. If this command
produces no output, then there's no difference (and thus no regression):
  ```sh
  diff -r --exclude=.git new/ByeWorld old/ByeWorld
  ```


![image](https://github.com/expo/expo/assets/14055146/4271aeb0-1b4d-4d7e-92e4-927bca060af4)


# Checklist

- [x] Documentation is up to date to reflect these changes (eg:
https://docs.expo.dev and README.md).
- [ ] Conforms with the [Documentation Writing Style
Guide](https://github.com/expo/expo/blob/main/guides/Expo%20Documentation%20Writing%20Style%20Guide.md)
- [x] This diff will work correctly for `npx expo prebuild` & EAS Build
(eg: updated a module plugin).

# Related

#27071

---------

Co-authored-by: Jamie Birch <jamie@MacBook-Pro.local>
Co-authored-by: Cedric van Putten <me@bycedric.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot: passed checks ExpoBot has nothing to complain about published Changes from the PR have been published to npm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants