Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Github-Overleaf workflow #87

Closed
ShiqiYang2022 opened this issue Oct 9, 2023 · 16 comments · Fixed by #93
Closed

Test Github-Overleaf workflow #87

ShiqiYang2022 opened this issue Oct 9, 2023 · 16 comments · Fixed by #93
Assignees

Comments

@ShiqiYang2022
Copy link
Collaborator

The purpose of this issue(#87) is opened for @gentzkow to test the Github-Overleaf workflow proposed in #84 (comment), while @ShiqiYang2022 providing support and fix anything wrong/unclear iteratively. The instruction of the workflow can be found here. I suggest we can use this issue as the issue we want to sync to Overleaf.

@gentzkow: Hi Matt -- I did not provide any further details apart from this instruction, because I think we want to mimic the feeling of outside lab-users. But I am happy to provide more details if needed, thanks!

@gentzkow
Copy link
Owner

@ShiqiYang2022 Just acknowledging that this is still on my plate -- sorry to be slow to get to it!

@gentzkow
Copy link
Owner

@ShiqiYang2022 I'm finally getting down to testing this.

Some small notes that are not really specific to the Github-Overleaf workflow

  1. I'm failing when I try to run the /paper_slides/ directory of template locally. It looks like the .tex compile is failing because it is looking for dependencies (like figures.tex) to be in the same directory as make.py. The run just hangs on the pdflatex command. Are you not seeing this?
  2. Separately, it throws and error if the latex_auxiliary_dir already exists. We should handle this case.
  3. It looks like our slides.tex in template doesn't have widescreen aspect ratio. We should replace the current \documentclass command with

\documentclass[11pt,english,t,aspectratio=169]{beamer}

If you could open appropriate issues to address (1) and (2) and just go ahead and fix (3) in template that would be great.

ShiqiYang2022 added a commit that referenced this issue Oct 28, 2023
ShiqiYang2022 added a commit that referenced this issue Oct 28, 2023
@ShiqiYang2022
Copy link
Collaborator Author

ShiqiYang2022 commented Oct 29, 2023

@gentzkow Thanks! Sorry for the inconvenience. Replies to #87 (comment):

  1. I shared the exact same error when I was shifting the .lyx to .tex compile. I fixed that in gslab_make/issues#64. However, we forgot to update the version of gslab_make submodule which we populated in template. So I believe this is an error due to referring the not-updated version of gslab_make in /lib/. Apologizes for the inconvenience.

I believe this is an easy fix, so I implemented directly in b89a02f.

  1. This is an intentional design because that would provide us with intermediate output that we can use to fix any error. Details see the threads in PR for #86: reformat template to fit tex compile #88 (comment). If paper_slides compiles successfully, it will delete latex_auxiliary_dir after it produced the output.
  2. Thanks for pointing that out! I fixed it in d84c115.

I re-cloned the repo and did the full run in branch issue87_workflow_test in my end, and it now compiles successfully. Confirmed it now pulls the correct version of gslab_make and the slides now have the 16:9 widescreen aspect ratio.

(template) SIEPR-C02G50GUML86:github_folders shiqiyang$ git clone git@github.com:gentzkow/template.git
Cloning into 'template'...
remote: Enumerating objects: 2214, done.
remote: Counting objects: 100% (1098/1098), done.
remote: Compressing objects: 100% (564/564), done.
remote: Total 2214 (delta 633), reused 878 (delta 530), pack-reused 1116
Receiving objects: 100% (2214/2214), 9.46 MiB | 4.56 MiB/s, done.
Resolving deltas: 100% (1101/1101), done.
(template) SIEPR-C02G50GUML86:github_folders shiqiyang$ cd template
(template) SIEPR-C02G50GUML86:template shiqiyang$ git fetch origin
(template) SIEPR-C02G50GUML86:template shiqiyang$ git checkout issue87_workflow_test
branch 'issue87_workflow_test' set up to track 'origin/issue87_workflow_test'.
Switched to a new branch 'issue87_workflow_test'
(template) SIEPR-C02G50GUML86:template shiqiyang$ cp setup/config_user_template.yaml config_user.yaml
(template) SIEPR-C02G50GUML86:template shiqiyang$ git submodule init
Submodule 'lib/gslab_make' (https://github.com/gslab-econ/gslab_make.git) registered for path 'lib/gslab_make'
(template) SIEPR-C02G50GUML86:template shiqiyang$ git submodule update
Cloning into '/Users/shiqiyang/Documents/github_folders/template/lib/gslab_make'...
Submodule path 'lib/gslab_make': checked out 'c08d6254f94f259ebd6d0cad994979911ab578c6'
(template) SIEPR-C02G50GUML86:template shiqiyang$ python run_all.py

*************************
* Running module `data` *
*************************
{'root': '..', 'config': '/Users/shiqiyang/Documents/github_folders/template/config.yaml', 'lib': '/Users/shiqiyang/Documents/github_folders/template/lib', 'config_user': '/Users/shiqiyang/Documents/github_folders/template/config_user.yaml', 'input_dir': 'input', 'external_dir': 'external', 'output_dir': 'output', 'output_local_dir': 'output_local', 'makelog': 'log/make.log', 'output_statslog': 'log/output_stats.log', 'source_maplog': 'log/source_map.log', 'source_statslog': 'log/source_stats.log', 'versions_log': 'log/versions.log'}
Cleared: `output`
Cleared: `log`
Starting makelog file at: `/Users/shiqiyang/Documents/github_folders/template/data/log/make.log`
Input links successfully created!
External links successfully created!
Source logs successfully written!
Version logs successfully written!
Executing command: `python  "/Users/shiqiyang/Documents/github_folders/template/data/code/merge_data.py" `
Executing command: `python  "/Users/shiqiyang/Documents/github_folders/template/data/code/clean_data.py" `
Output logs successfully written!
WARNING! Certain files tracked by git exceed the config size limit (0.5 MB). See makelog for list of files.
Ending makelog file at: `/Users/shiqiyang/Documents/github_folders/template/data/log/make.log`

*****************************
* Running module `analysis` *
*****************************
Cleared: `output`
Cleared: `log`
Starting makelog file at: `/Users/shiqiyang/Documents/github_folders/template/analysis/log/make.log`
Input links successfully created!
External links successfully created!
Source logs successfully written!
Version logs successfully written!
Executing command: `python  "/Users/shiqiyang/Documents/github_folders/template/analysis/code/analyze_data.py" `
Output logs successfully written!
Ending makelog file at: `/Users/shiqiyang/Documents/github_folders/template/analysis/log/make.log`

*********************************
* Running module `paper_slides` *
*********************************
Removed: `/Users/shiqiyang/Documents/github_folders/template/paper_slides/input`
Cleared: `output`
Cleared: `log`
Starting makelog file at: `/Users/shiqiyang/Documents/github_folders/template/paper_slides/log/make.log`
Input copies successfully created!
External copies successfully created!
Source logs successfully written!
WARNING! The following target files have been modified according to git status:
/Users/shiqiyang/Documents/github_folders/template/data/output/chips_sold.pdf
Version logs successfully written!
Executing command: `pdflatex -output-directory=../latex_auxiliary_dir  "/Users/shiqiyang/Documents/github_folders/template/paper_slides/code/paper.tex"`
Executing command: `pdflatex -output-directory=../latex_auxiliary_dir  "/Users/shiqiyang/Documents/github_folders/template/paper_slides/code/online_appendix.tex"`
Executing command: `pdflatex -output-directory=../latex_auxiliary_dir  "/Users/shiqiyang/Documents/github_folders/template/paper_slides/code/slides.tex"`
Output logs successfully written!
Ending makelog file at: `/Users/shiqiyang/Documents/github_folders/template/paper_slides/log/make.log`

It would be great if those errors/problems addressed properly in branch issue87_workflow_test at your end, thanks!

@gentzkow
Copy link
Owner

@ShiqiYang2022 Here is the result of my testing.

Manual Option

Creating a new project and uploading was quick and easy. The paper_slides directory compiled on Overleaf on the first try. Downloading the project and committing back to Github was also easy.

I also agree about the cons you had originally outlined here. Of these, I am not so worried about losing the granular commit history / attribution. That is more important for code than for .tex files. My guess is we would notice this difference rarely if ever.

I am more worried about the fact that you can't upload a whole directory to Overleaf (your Con number 4 here) E.g., if we have a new version of the /input/ directory that we want to import we'd need to go through and upload the files in each subdirectory of it one-by-one. The solution I tested for this was using Dropbox sync. It's a little clunky, but this workflow seemed fine for me.

  1. Set up Dropbox Sync
  2. Create the new Overleaf project and upload new content as in the original instructions.
  3. To upload new or changed content to Overleaf, change it in the synced Dropbox directory

Mirror Repository Option

All of the instructions worked well for me. The only thing I noted was that in this step:

image

I needed to use the SSH address of the repo rather than the https address because I use SSH keys for authentication.

My overall conclusion is that this is a creative and really neat solution, but that the costs it adds are too high for me to recommend it as a standard workflow. I think for my own work, at least, the manual approach will be faster. If we can solve the upload issue with Dropbox sync as above the only big loss to the manual approach is the granular commit history, which I'm not so worried about.

My main concerns w/ the mirror repo approach are:

  1. Creating and deleting Github repos is a heavy step. Github is not built for this to be something one does frequently. I don't love having the sync repos mixed together with my other repos. Deleting a sync repo requires going through Github steps designed to prevent you from deleting a repo inadvertently.

  2. The steps for syncing content up and down are still pretty complicated (even with the automation) and I think they will take more time than the manual version in most cases. They also introduce a bunch of failure points where people could make mistakes and end up with a scrambled commit history.

  3. Github sync on Overleaf is fairly slow. Also, this approach requires that we upload the whole repo to Overleaf rather than just paper_slides.

I'm therefore going to vote that we add the manual approach to the lab manual and archive the mirror repo approach. I don't think we want either of these permanently on the template wiki. For the latter, let's make sure the instructions are stored (as PDF and/or markdown) here and also in the original #84 issue thread.

Possible Third Option?

One other thought I had: For some projects, people may prefer to have a really Overleaf-centric workflow where there is a single Overleaf project per Github repo and everybody makes edits there simultaneously even if they're part of separate issues. This also allows using Overleaf features like commenting, version history, etc.

I think we can allow this, if we follow something like the following:

  1. The structure of the Overleaf project mirrors the paper_slides directory

  2. Whenever we complete a Github issue that involved updating the Overleaf project we manually copy down the Overleaf files, run make.py, commit to the issue branch, and then merge to master.

  3. We should also be in the habit of including a draftable diff of the PDF as part of the final comment for all such issues.

The main loss of this relative to our main manual option is that the commits for one issue are going to get comingled with changes that are part of other issues. That's not great, but it seems a reasonable price to pay in cases where people feel strongly about using Overleaf in the "traditional" way. (Certainly better than what we currently do w/ the FB2020 projects where what is on Overleaf is never committed to the Github repo!)

Curious to hear your thoughts on this.

@ShiqiYang2022
Copy link
Collaborator Author

@gentzkow Thanks for the very detailed comment! Confirming I read this and am re-thinking the options.

@gentzkow
Copy link
Owner

@ShiqiYang2022 Thanks for the fixes here.

On (2), is there a reason we can't have make.py delete the latex_auxiliary_dir if it already exists when it is run? It's not a big hassle to delete it manually but when one is doing a lot of debugging it's helpful to be able to run repeatedly.

I checked that everything works on the branch -- perfect!

@ShiqiYang2022
Copy link
Collaborator Author

@gentzkow Thanks!

On (2), is there a reason we can't have make.py delete the latex_auxiliary_dir if it already exists when it is run? It's not a big hassle to delete it manually but when one is doing a lot of debugging it's helpful to be able to run repeatedly.

I agree we need to delete latex_auxiliary_dir branch if it already exists. I opened gslab-econ/gslab_make#66 in gslab_make repository to implement.

@ShiqiYang2022
Copy link
Collaborator Author

@gentzkow Thanks again for the detailed feedback. I went through and played with workflows in #87 (comment), attached is my thoughts.

Manual Option

I agree and resonate with your perspective on losing the granular commit history / attribution of .tex. I agree this might be a trade-off we're willing to make, if the only loss is the granularity of commits.

I played with Dropbox sync, and I think it is a good solution of uploading the whole directory in Manual Option!(and, thanks!) The only concern for this method is Dropbox Synchronization is a Overleaf premium feature. Overleaf premium is open for Stanford users, but I worry its accessibility of collaborators.

While I played with manual option, another note arose in my mind when there's collaboration:

Dropbox sync permits us to upload the whole directory, but it does not have a brake/warning for file conflicts. Suppose Alice made changes on tab:A and fig:A, then Bob made changes on tab:B and fig:B. If we use mirror option, Bob cannot successfully commit our change without rebasing solving the file conflicts. But in Dropbox-sync, Bob might not have the habit of checking Overleaf history and he might unintentionally upload /inputs/, overwriting Alice's update.

A potential solution for this is, make sure (1) we always instantly update the changes in /input/ both on github and dropbox-sync; (2) rebase collaborator's commit before updating to dropbox-sync. Plus, in manual option, we need request collaborators to modify .tex files only on the overleaf, because we don't commit our .tex changes until the end of issue.

Mirror Option

  1. Creating and deleting Github repos is a heavy step. Github is not built for this to be something one does frequently. I don't love having the sync repos mixed together with my other repos. Deleting a sync repo requires going through Github steps designed to prevent you from deleting a repo inadvertently.

  2. The steps for syncing content up and down are still pretty complicated (even with the automation) and I think they will take more time than the manual version in most cases. They also introduce a bunch of failure points where people could make mistakes and end up with a scrambled commit history.

  3. Github sync on Overleaf is fairly slow. Also, this approach requires that we upload the whole repo to Overleaf rather than just paper_slides.

Revisit of main concerns

I agree with all, and I would like to add my two thoughts:

  • For 1, Mirror option might not be mainly designed for PIs, I think many cases RAs can create mirror repo instead. It's not necessary to share mirror repo to all, so we can prevent this repo appear in PI's github page.
  • For 2, I agree the fix cost of setting up is relative high. I think mirror option is designed for some long-lasting and big issues. For example, BLP-issue153 "Tailor simulations for new draft" is a issue lasted for 7 months with 400+ comments and 250 commits. Once mirror option is set up, the marginal cost for sync is 0, while manual option requires to update manually in new commits(MC $\ne$ 0). So in such kind of issue, I prefer mirror than manual.

For decision: I agree to add manual and archive mirror. I propose to provide a link to the archived mirror in the lab-manual, suggesting that we use mirror if we expect the issue is long and big.

Third Option (Overleaf-centric workflow)

I think this is a feasible option. I think Overleaf-centric workflow will significantly fail only if there are two separate issues which both need big changes with respect to text(because it will scramble all the text edits). If there's only one issue that mainly focus on text improvement, I think the workflow works for me.

Worst case scenario, we use this when we have more than one text edit issue. I think it would also not be a big problem, because people already feel strongly about using Overleaf in the "traditional" way, that means they implicitly accept the lost of potential mingle commits.

I agree that we should include a draftable diff of the PDF as part of the final comment for all such issues. Also, I propose in such issues, we need to request the contributor for text edits in other issues to review the text changes in associated PR.

Minor Replies

image

I needed to use the SSH address of the repo rather than the https address because I use SSH keys for authentication.

Thanks for the great catch! I updated this in the instructions.

Thanks!

@gentzkow
Copy link
Owner

Brilliant. Thank you @ShiqiYang2022! I agree w/ all points above.

With that, I think you can close out this issue and we can wrap the work on Overleaf sync. Thanks for all your very hard work here!!

@ShiqiYang2022
Copy link
Collaborator Author

ShiqiYang2022 commented Oct 29, 2023

@gentzkow Thanks! It's so nice to see we converged Overleaf sync. The next steps are:

(1) Address the problems in gslab_make and open a PR for this issue.
(2) Implement the decisions in #87 (comment) and revise the workflow accordingly.
(3) Open an issue in lab-manual and add the workflow into lab manual Wiki.

I will implement those bullets listed next week.

@ShiqiYang2022
Copy link
Collaborator Author

ShiqiYang2022 commented Nov 1, 2023

@gentzkow

I am thinking an improve that might make the manual option easier to manage. Instead of creating a mirror Overleaf project for each issue, we create mirror /paper_slides/ folder in the main Overleaf project.

For instance, for this issue #87, we create an auxiliary folder called /paper_slides_87/ which lives in the same directory as paper_slides in the main Overleaf project. We sync to dropbox and update inputs in /paper_slides_87/input/, and commit changes as we previously agreed. When the issue #87 is closed, we simply delete the /paper_slides_87/ subfolder just like we delete the issue branch. By this way, we avoid creating mirror Overleaf projects(and share to collaborators) for every single issue.

The potential loss is, the collaborators might by mistake update /paper_sildes/ used in master branch in Overleaf. But this mistake could also happen when we use mirror Overleaf projects, so this seems like a loss we'd like to bear.

Let me know your thought on this, thanks!

@gentzkow
Copy link
Owner

gentzkow commented Nov 5, 2023

Yes, great idea!

Do we need to have main /paper_slides/ folder? Why not only have the issue ones?

@ShiqiYang2022
Copy link
Collaborator Author

ShiqiYang2022 commented Nov 5, 2023

@gentzkow Thanks! I agree main /paper_slides/ folder is not necessary.

What I have previously in mind is having a full copy of the github repo synced to Overleaf(using github sync), like we did in FB2020, and add auxiliary /paper_slides_issueXX/ folders. By doing this, if we want to implement edits directly when there's no issue folder, we can directly edit and commit to the master branch using github sync.

I think it's better to let all collaborators to follow the standard workflow, i.e. creating issues for text edits, so I agree main /paper_slides/ has no necessity for its existence. I will then edit the workflow proposed in lab-manual accordingly.

@gentzkow
Copy link
Owner

gentzkow commented Nov 5, 2023 via email

@ShiqiYang2022
Copy link
Collaborator Author

ShiqiYang2022 commented Nov 5, 2023

Per #87 (comment) and given all bullets in #87 (comment) is properly addressed, I am closing this issue.

ShiqiYang2022 added a commit that referenced this issue Nov 5, 2023
* #87 Hot fix of gslab_make version

* #87 Fix widescreen ratio

* #87 Update gslab_make to delete aux folder
@ShiqiYang2022
Copy link
Collaborator Author

Summary + Deliverables

In this issue(#87) we test the Github-Overleaf workflow proposed in #84 (comment), and revise the workflow according to #87 (comment) #87 (comment). This thread continued in the associated PR (#93).

We added our workflow in lab-manual/wiki. The archived mirror option is mirror-repo-workflow.pdf.

Merged to master in 1f2d868.
Final state of the issue branch here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants