Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what to do to update to the latest dataverse 4.6.2 #8

Open
moayadnajd opened this issue Jun 10, 2017 · 43 comments
Open

what to do to update to the latest dataverse 4.6.2 #8

moayadnajd opened this issue Jun 10, 2017 · 43 comments

Comments

@moayadnajd
Copy link

moayadnajd commented Jun 10, 2017

what should i change to make it work with the latest version
i changed the download link to be 4.6.2 but it not working

@moayadnajd moayadnajd changed the title what to do to what to do to update to the latest dataverse 4.6.2 Jun 10, 2017
@pdurbin
Copy link

pdurbin commented Jul 2, 2017

Dataverse 4.7 is out: https://github.com/IQSS/dataverse/releases/tag/v4.7

I'm one of the developers. How can I help?

@pdurbin
Copy link

pdurbin commented Jul 2, 2017

Also, the Dataverse team could use some help getting up to speed with Docker. Please see IQSS/dataverse#3938 about how we'd like to start by attempting to use Docker in development environments. Production use would potentially follow.

@omaralsoudanii
Copy link

omaralsoudanii commented Jul 6, 2017

@pdurbin we want to modify the docker file in this repo to deploy from the latest version (4.7)

@craig-willis
Copy link
Collaborator

@pdurbin @moayadnajd Sorry, didn't see this issue until it was called out in the Dataverse repo. I'll see what needs to be done to upgrade to 4.6.x and 4.7.

@moayadnajd In IQSS/dataverse#3938 you mentioned that you need to use handles. Could you let me what you need and what you tried?

@moayadnajd
Copy link
Author

@craig-willis we are working in a platform and we send dataset through the API and thats good but we have handle.net account and we need to disabled DOI and use handle support in 4.6.X or 4.7
we try to change this line

&& wget https://github.com/IQSS/dataverse/releases/download/v4.2.3/dvinstall-4.2.3.zip \
to 4.7 and the process done but there is errors in database creation and generating default data
iam not fully familiar with dataverse so i don't know how to debug that process Thanks for your help

@moayadnajd
Copy link
Author

CC @omaralsoudanii

@pdurbin
Copy link

pdurbin commented Jul 6, 2017

@moayadnajd can you please make a pull request with your change to that line? You'll need to change the "unzip" line below it as well.

@moayadnajd
Copy link
Author

@pdurbin yes i did change the unzip as i said the process done and i will make pull request

@craig-willis
Copy link
Collaborator

craig-willis commented Jul 6, 2017

@moayadnajd One of the bigger changes from the usual Dataverse deploy process is that we pre-generate the database DDL. This needs to be done for each version. I don't know if this is still the case, but in previous versions Dataverse uses EclipseLink to create the database schema during initial startup.

@pdurbin You don't happen to have the generated DDL/schema handy for 4.6.x and 4.7? I usually generate it by changing the ddl-generation.output-mode to script during startup.

@moayadnajd
Copy link
Author

#9

@moayadnajd
Copy link
Author

@craig-willis how i can change the DDL is it easy to do ?

@moayadnajd
Copy link
Author

@craig-willis i saw this link https://datasets.socialhistory.org/dataset.xhtml?persistentId=hdl:10622/HPIC74 they are using handle instead of the DOI in v.4.3
if we change the version of docker file to 4.3 do we will have handle support or not ?

@craig-willis
Copy link
Collaborator

@moayadnajd I need to document the DDL generation process, it's not very straightforward. I also expect other things have changed in the install process between 4.2 and 4.7.

I'm not sure about handle support in 4.3. The documentation seems to suggest that handle support is incomplete in 4.3 (http://guides.dataverse.org/en/4.3/installation/config.html).

@moayadnajd
Copy link
Author

@craig-willis ohh so we will wait until there is update to the docker image when you expect it will be ready for latest version ?

@craig-willis
Copy link
Collaborator

@moayadnajd I'll try to have something in the next day or two.

@pdurbin Disregard my previous question, I've generated the DDL.

@craig-willis
Copy link
Collaborator

@moayadnajd I've pushed two images to Dockerhub from my personal fork, one for Dataverse 4.7 and one for Solr (it looks like the schema.xml changed since we last built our images).

craigwillis/dataverse:4.7
craigwillis/dataverse-solr:4.7

You can see the changes here:
master...craig-willis:upgrade-4.7

This hasn't been fully tested and I've done nothing specific to change the configuration for handles. Any feedback welcome.

@pdurbin pdurbin mentioned this issue Jul 6, 2017
@pdurbin
Copy link

pdurbin commented Jul 6, 2017

@craig-willis thanks! @moayadnajd @omaralsoudanii the Handle config is documented at http://guides.dataverse.org/en/4.7/installation/config.html#persistent-identifiers-and-publishing-datasets but if you have any trouble with Handle support, please open an issue at https://github.com/IQSS/dataverse/issues . Thanks.

I was wondering what DDL meant (talking about it with @pameyer and @bjonnh at http://irclog.iq.harvard.edu/dataverse/2017-07-06#i_54184 ) but from master...craig-willis:upgrade-4.7 it's now obvious to me that it comes from <property name="eclipselink.ddl-generation" value="create-tables"/>.

@craig-willis
Copy link
Collaborator

Thanks, @pdurbin. By DDL I was referring to data definition language, aka SQL schema.

During my initial Dockerization of Dataverse, I had problems with EclipseLink as it is configured. In the Docker environment, containers can be easily brought up and down. The current persistence.xml defaults to “create-tables” when the webapp is deployed. If a container is restarted, resulting in redeployment of the webapp, startup fails during table creation. I worked around this with the static DDL/schema which is used to initialize the database during container startup if it's never been run before, and setting eclipselink.ddl-generation to "none" in persistence.xml.

There may be a better way to do this with EclipseLink, but I never found it.

@moayadnajd
Copy link
Author

@craig-willis thank you for your help we will start testing and gave you feedback soon

@omaralsoudanii
Copy link

@pdurbin @craig-willis the build is working now , but the handle registration fails ,
and trying to publish a dataset results in this error :
Error – This dataset may not be published because the Handle Service is currently inaccessible. Please try again. Does the issue continue to persist? If you believe this is an error, please contact Root Support for assistance.

trying to click on the handle link in the dataset leads to this page in handle.net :
Error - Not Found

The handle you requested --

20.500.11766/SHEQW3

-- cannot be found.

Please contact us if you wish to report this error. Please include information regarding where you found the handle

we added the private key file in the jvm option through dataverse.handlenet.admcredfile

and used curl for database options mentioned in the documentation to use handle generation through these options mentioned in the documentaion :

curl -X PUT -d doi http://localhost:8080/api/admin/settings/:Protocol

curl -X PUT -d 10.xxxx http://localhost:8080/api/admin/settings/:Authority

is there anyway to debug the handle errors or to check what's the problem ?
thanks

@craig-willis
Copy link
Collaborator

@omaralsoudanii I'm not familiar with the handle configuration. With our Docker image, the log file should be in /usr/local/glassfish4/glassfish/domains/domain1/logs in the Dataverse container.

@pdurbin
Copy link

pdurbin commented Jul 10, 2017

@omaralsoudanii for Handle you should be using "hdl" rather than "doi" like this:

curl -X PUT -d hdl http://localhost:8080/api/admin/settings/:Protocol

For more on this topic, please see http://guides.dataverse.org/en/4.7/installation/config.html#configuring-dataverse-for-handles

@omaralsoudanii for help with Dataverse, please contact us at http://guides.dataverse.org/en/4.7/installation/intro.html#getting-help

@omaralsoudanii
Copy link

omaralsoudanii commented Jul 10, 2017

@pdurbin sorry i just copied them from the documentaion, I know the value should be hdl ,This is the real values for my curl :
curl -X PUT -d hdl http://data.mel.cgiar.org/api/admin/settings/:Protocol
curl -X PUT -d 20.500.11766 http://data.mel.cgiar.org/api/admin/settings/:Authority

you can find the handle in use in our DSPACE production server at https://mel.cgiar.org/repo ,
we want to use the same handle on our dataverse server at http://data.mel.cgiar.org

@pdurbin
Copy link

pdurbin commented Jul 10, 2017

@omaralsoudanii ah, ok. Hmm. I don't know a lot about Handle support and it's pretty new, included in Dataverse 4.6.2 and higher. Can you please try one or more of the channels listed at http://guides.dataverse.org/en/4.7/installation/intro.html#getting-help ? Thanks!

@omaralsoudanii
Copy link

@pdurbin thank you i posted there , this error in logs mention :
[2017-07-11T07:02:23.723+0000] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.HandlenetServiceBean] [tid: _ThreadID=28 _ThreadName=http-listener-1(2)] [timeMillis: 1499756543723] [levelValue: 1000] [[
Can't load private key in null: java.lang.NullPointerException]]

@pdurbin
Copy link

pdurbin commented Jul 11, 2017

@omaralsoudanii thanks, I replied to you at http://irclog.iq.harvard.edu/dataverse/2017-07-11

@pdurbin
Copy link

pdurbin commented Sep 14, 2017

@craig-willis hi! I just left a comment at IQSS/dataverse#4040 (comment) about how I'm trying to use your images on DockerHub in an OpenShift environment. Do you have any time for jump in #dataverse on freenode or wherever you like to talk about what I'm up to? 😄

@pdurbin
Copy link

pdurbin commented Sep 18, 2017

I had a nice chat today with @bodom0015 about this repo at https://gitter.im/nds-org/ndslabs?at=59bff3d9cfeed2eb65247bfb . I got the NDS Dataverse 4.2.3 Docker image deployed to OpenShift and I'm working through problems. I'm posting status updates at IQSS/dataverse#4040

@pdurbin
Copy link

pdurbin commented Mar 9, 2018

@craig-willis @moayadnajd @omaralsoudanii heads up that I'm planning on chatting with @aculich and others about Docker and Kubernetes in a couple hours (10:30am eastern) if you're interested in joining the discussion at #dataverse on freenode. It'll be logged at http://irclog.iq.harvard.edu/dataverse/2018-03-09 and for more background, you can read http://irclog.iq.harvard.edu/dataverse/2018-03-02#i_63995 . Thanks.

@craig-willis
Copy link
Collaborator

Thanks @pdurbin. I wish I could join but will check out the chat logs.

@pdurbin
Copy link

pdurbin commented Mar 9, 2018

@craig-willis no worries. I'm making an attempt to list in one place all the Docker and Kubernetes-related stuff going on so I hope you don't mind that I listed your GitHub username over on my "Dev Efforts by the Dataverse Community" spreadsheet at https://docs.google.com/spreadsheets/d/1pl9U0_CtWQ3oz6ZllvSHeyB0EG1M_vZEC_aZ7hREnhE/edit?usp=sharing

For more context on what that spreadsheet is about, please see my " Which GitHub issues are being worked on by the Dataverse community?" post at https://groups.google.com/d/msg/dataverse-community/X2diSWYll0w/ikp1TGcfBgAJ

At the moment, I'm trying to characterize your status as wanting to move to official IQSS Docker images once they have been blessed by IQSS. This is the "status" column of the spreadsheet above. Thanks!

@craig-willis
Copy link
Collaborator

Thanks, @pdurbin. That sounds like a good characterization.

Read through notes of your discussion with @aculich -- thank you both for sharing. @Xarthisius @amoeba @mbjones -- you might also be interested
https://docs.google.com/document/d/12njDDHHfoNdos0By3lKlxYbRWLsAp8Eo-AGpjuBF_rY/edit

The idea of integrating repository systems like Dataverse with containerized analysis environments is something we've discussed extensively in the NDS meetings and is actively part of the Whole Tale project. On the one hand is supporting the ability for researchers to explore/analyze/collaborate around data in research repositories. On the other is defining a way to publish a new class of research object -- data + code/notebook + image or image definition -- that would complement a data or paper publication, and support users re-running these (e.g., via link from repository or publisher site).

The Whole Tale project (http://wholetale.org/) defines a "Tale" as shareable/preservable research objects that combine data + code/narrative (e.g., notebook) with the computational environment for reproducibility (e.g., Docker image definition). Initial collaboration is with DataOne and Globus to define a package format that would allow the tale to be published and then run on-demand by users via the Whole Tale or similar platform. Serialization format discussion is going on here whole-tale/whole-tale#24. Whole Tale is also developing a framework to pull data from external sources (e.g. via DOI/URL) for users to actively work on -- a sort of BinderHub with data.

If anyone is interested in continuing this discussion (I certainly am), maybe we can find a good place? We started a forum coming out of a related workshop this summer https://groups.google.com/forum/#!forum/container-analysis-environment, which hits others interested in this topic (SciServer, Cyverse, etc), but has almost no activity and misses the repository community.

@pdurbin
Copy link

pdurbin commented Mar 14, 2018

@craig-willis I'm glad you got something out of those notes. 😄

My impulse to say that this discussion should happen anywhere but in some random GitHub issue like this but at least it's public so people can read it. More on on this later.

I believe I first heard about the Whole Tale project when @victoriastodden gave a keynote address at the 2017 Dataverse Community Meeting entitled "Toward a Reproducible Scholarly Record". See slide 20 and beyond at https://osf.io/5euj9/ via https://projects.iq.harvard.edu/dcm2017/agenda#widget-2 . It sounds neat.

I'm not sure how to get computation people talking to repository people. I don't know if this is reflected in my notes but @aculich chatted about this problem. Some thoughts on various channels:

Of course, I'm not sure how much time I have to invest in this conversation personally. I'm happy to join a mailing list or whatever. Maybe you can come to a future Dataverse Community Meeting. 😄 Or maybe you and I could have a phone call some day to compare notes.

By the way, I did take a quick look at BinderHub yesterday. I put some first impressions into jupyterhub/mybinder.org-user-guide#80

@pdurbin
Copy link

pdurbin commented Apr 9, 2018

If anyone is interested in continuing this discussion (I certainly am), maybe we can find a good place?

@craig-willis would you be interesting in joining the discussion at https://groups.google.com/d/msg/dataverse-community/VG6gTMEd_Ps/Xy7jDhVoBwAJ ? it was kicked off last Friday by @sean-dooher Kevin Yang, and @aculich

@omaralsoudanii
Copy link

@pdurbin @craig-willis @moayadnajd
We have installed dataverse 4.7 successfully on docker using this repo .
Just wondering how do we get the latest database changes in order to update 4.8 .
Trying to change the following :

RUN cd ~
&& wget https://github.com/IQSS/dataverse/releases/download/v4.7/dvinstall.zip

To

RUN cd ~
&& wget https://github.com/IQSS/dataverse/releases/download/v4.8.6/dvinstall.zip \

and after build accessing : $DOMAIN_NAME/dataverse/root
will result in 500 error : storageidentifier column doesn't exist .

@pdurbin i noticed that the official docker support for dataverse mentions that it's used for development only , Is there a production version ?

Thank you !

@pdurbin
Copy link

pdurbin commented May 14, 2018

@omaralsoudanii thanks for opening IQSS/dataverse#4665 asking about a production version of Dataverse running in Docker. The short answer is no, there isn't one, but let's discuss further in that issue.

@pdurbin
Copy link

pdurbin commented Sep 6, 2018

@craig-willis Are you aware of the new https://github.com/IQSS/dataverse-docker repo? It's being maintained by the community and was recently updated to the latest version of Dataverse, which is 4.9.2 (see IQSS/dataverse-docker#3 ). How do you feel about using it for NDS Labs Workbench?

@craig-willis
Copy link
Collaborator

craig-willis commented Sep 13, 2018

@pdurbin We will be very happy to move to the community images. "Specs" describing application stacks for Workbench are pretty straightforward, so in the end this will be a PR to https://github.com/nds-org/ndslabs-specs/tree/master/dataverse.

A couple of things:

  • The current application stack also includes work with @jonc1438 to demonstrate integration with iRODS for an IASSIST workshop. I don't expect that this is being used, but wanted to at least note it here as it will likely not be available using the official images.
  • We prefer to use auto-built images, but those in https://hub.docker.com/r/iqss appear to be manually pushed. Is there another repo with the "official" or at least auto-built images?

@pdurbin
Copy link

pdurbin commented Sep 17, 2018

@craig-willis hi! It was an absolute pleasure to chat with you at the 2018 Whole Tale workshop! I know the conversation in this GitHub issue is getting long, but I guess we'll keep going with it here.

As you indicated, the fix for this issue is to update the configs at https://github.com/nds-org/ndslabs-specs/tree/master/dataverse to point to a newer version of Dataverse than 4.2.3 (4.9.2 is the latest release as of this writing). Currently, Docker images are being pulled from https://hub.docker.com/r/ndslabs and you'd like to know where you can pull newer images from.

You should not pull images from https://hub.docker.com/r/iqss because @IQSS doesn't have the resources at this time to push images there and maintain them. Anything you find under "iqss" is still highly experimental and perhaps somewhat tied to OpenShift. You can read more about these experiments at http://guides.dataverse.org/en/4.9.2/developers/containers.html#future-production-use-on-minishift-openshift-kubernetes

So, if not "iqss", where should you pull images from?

I'm hoping that you can pull images that the Dataverse community has built. I recently created https://github.com/IQSS/dataverse-docker for the community and @4tikhonov @wilkos-dans and @xibriz from the Dataverse community have been iterating on the images there. What I don't know is if they have pushed images for public use such as your NDS Labs Workbench, which I have documented from the Dataverse perspective at http://guides.dataverse.org/en/4.9.2/installation/prep.html#nds-labs-workbench-for-testing-only . To be clear, NDS Labs Workbench is only for testing so if the Dataverse images aren't perfect, that's fine. They can continue to be improved in the future. I found some images at https://hub.docker.com/r/vtycloud/ that seem to belong to @4tikhonov. If he approves of the use of those images for the purpose of spinning up Dataverse for testing with NDS Labs Workbench, that's fine with me.

I also wanted to circle back to the conversation above "idea of integrating repository systems like Dataverse with containerized analysis environments" and how there seems to be no good place to discuss this. This issue probably isn't the best place. 😄 During the Whole Tale workshop I mentioned at new forum that was launched by @aculich and others at PEARC18 during a talk called "On Launching a Research Computing Q&A Site using StackExchange and Discourse". Here's the abstract from https://pearc18.conference-program.com/?page_id=10&id=bof117&sess=sess200

In September, 2017, the Northeast Cyberteam Initiative (https://necyberteam.org) began a project to build a Research Computing Q&A site which will allow the research computing community to achieve better/faster research results; and enable a sustainable workforce of facilitators to provide greater acceleration of research in the long term. At this BOF, we will explain the project; describe what it takes to create a new Q&A site on StackExchange and on Discourse; give an update on where we are in the process, and engage the audience to join the effort. Establishing a Q&A site of this nature requires some tenacity. Through the efforts of the authors of this BoF and many others who have contributed, we have gained some traction, and hope to engage the broader community to firmly establish this platform as a tool for the global research computing community.

The goal of this project is to aggregate answers to a broad spectrum of questions that are commonly asked as researchers utilize advanced computing resources, creating a self-service knowledge base for the community of domain researchers, facilitators, cyberinfrastructure (CI) engineers and others that do Research Computing. Making this knowledge readily available frees up time for facilitators and CI engineers to focus on more advanced subject matter, thereby elevating the practice.

Through the launch process we have been thinking seriously about what defines "research computing" in relation to other computing disciplines. The process has generated some very thoughtful discussion and we believe that it's of great benefit to our community to work on this definition as it relates to career development and the emergence of research computing as a distinct sub-discipline. Our hope is that not only will the Research Computing Q&A site become a great resource for the community, it will also provide a public testimony of the reality of research computing and how is exists in relation to IT, Computer Science, and domain research.

The website for the new forum is https://ask.cyberinfrastructure.org and perhaps it would be a good place to have a vendor-neutral discussion. I was involved a bit in the effort to seed the site with questions at launch and I know they are friendly to questions about data repositories. For example, I created the question at https://ask.cyberinfrastructure.org/t/where-can-i-find-introductory-material-on-publishing-research-data/154 and I would be happy to create another one about integrating data repositories with containerized analysis environments.

Anyway, it sounded like this new forum wasn't on your radar so I wanted to put it there. 😄

@craig-willis
Copy link
Collaborator

@pdurbin Indeed -- it was great to finally meet you and I thoroughly enjoyed our discussions.

On the topic of the community Dataverse images, if you don't plan to push to the iqss Docker organization, perhaps a reasonable short-term solution would be for us to setup automated builds on Dockerhub that push to our ndslabs organization. We prefer auto-builds to simplify tracing the source of images and to flag image build errors -- and that the person/organization has to some extent committed to ensure the images work in Labs Workbench. We can revisit this if/when the Dataverse team provides official images.

Thanks for the pointer to https://ask.cyberinfrastructure.org. I was at PEARC and should've tracked on this, but apparently have a poor radar.

@pdurbin
Copy link

pdurbin commented Nov 28, 2018

@craig-willis I wanted to give you a heads up that IQSS/dataverse#5317 was merged yesterday which means that you should have a bit more raw material to work with inside of Kubernetes, if you want. Specifically, we have added SQL files to create the database schema for Dataverse for all version from 4.0 to 4.9.4 and have added instructions for ourselves to continue adding these SQL files for all future releases.

I considered mentioning this on whole-tale/whole-tale#49 but this issue seemed a bit more on topic. Maybe we should update the title from 4.6.2 to 4.9.4, since that's the latest Dataverse release. Time flies!

One other thing I'll mention is that I'd love for you to chat with @poikilotherm at some point. Over at http://irclog.iq.harvard.edu/dataverse/2018-11-28#i_80180 I was suggesting that I could create for him a git repo under IQSS called dataverse-kubernetes because he has some ideas of how to do docker and kubernetes properly, like you do, but he might benefit from a collaborator like yourself. I don't know. I'm always happy to create repos like this under IQSS, flag them as community-supported in our guides, and create the proper "team" in GitHub to let the hacking begin. Sometimes great things happen. Sometimes the projects never really take off. Mornings in http://chat.dataverse.org are usually a good time to catch him, because timezones. 😄

@craig-willis
Copy link
Collaborator

Great news, @pdurbin. I'll hopefully have time to revisit this next week (also in the context of IQSS/dataverse-docker#8) and will make a note to try to connect on IRC.

@pdurbin
Copy link

pdurbin commented Dec 11, 2018

@craig-willis please take a look at IQSS/dataverse#5373 because @poikilotherm wants to host a community call on Dataverse and Kubernetes. Everyone is welcome, of course!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants