Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues with installing the package #12

Closed
ViralBShah opened this issue Jun 20, 2020 · 15 comments
Closed

Some issues with installing the package #12

ViralBShah opened this issue Jun 20, 2020 · 15 comments

Comments

@ViralBShah
Copy link

ViralBShah commented Jun 20, 2020

I first tried the instructions in the README, but they fail due to the hardcoded paths in the Manifest.

The package ships a Manifest as well as a Project.toml. I think it would be sufficient to ship just the Project.toml. In fact I had to delete the Manifest to get it to correctly dev. The Manifest has some hardcoded paths on the author's computer.

@jonathan-laurent
Copy link
Owner

Thanks for reporting this! It appears that my last commit on master broke the build indeed.

The package ships a Manifest as well as a Project.toml. I think it would be sufficient to ship just the Project.toml.

To be clear, isn't it a common practice for projects that are not supposed to be used as dependencies themselves to include a Manifest file so as to pin all dependencies to specific versions?
Anyway. I agree it would be preferable to remove the Manifest file and put version bounds for the dependencies in the Project.toml.

There's a dependency on CuArrays.jl - but that should be deleted, since it is now merged into CUDA.jl.

I will do this and fix the master branch later today. Thanks again!

@ViralBShah
Copy link
Author

Deleting the Manifest (and resolving some old versions of packages I had) was sufficient to get using AlphaZero to work and start on running the tests.

I noticed that Flux support is commented out. Would be nice to avoid installing unnecessary packages in the Project.toml.

@ViralBShah
Copy link
Author

Actually I had to add CuArrays.jl to make it work. I seem to have a bad state of dependencies, but I do have it running, and I had to add CuArrays for it.

You're right that Manifests are good for reproducibility, but the Manifest right now has a hardcoded path causing things to fail. Especially for folks in the community who want to poke around the package, the ability to add it and dev it is nice.

@ViralBShah
Copy link
Author

The tests did not run successfully:

Test Summary: | Pass  Total
Testing Games | 5278   5278
Dummy Runs: Error During Test at /home/viralbshah/.julia/dev/AlphaZero/test/runtests.jl:16
  Test threw exception
  Expression: dummy_run(Tictactoe)
  UndefVarError: lib not defined
  Stacktrace:
   [1] _broadcasted(::typeof(+), ::Knet.KnetArray{Float32,2}, ::Knet.KnetArray{Float32,1}, ::Knet.KnetArray{Float32,2}, ::Tuple{Tuple{Int64,Int64},Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Bool}) at /home/viralbshah/.julia/packages/Knet/bTNMd/src/binary.jl:65

@ViralBShah
Copy link
Author

Wiped out .julia and starting afresh:

(@v1.4) pkg> dev git@github.com:jonathan-laurent/AlphaZero.jl.git
    Cloning git-repo `git@github.com:jonathan-laurent/AlphaZero.jl.git`
  Resolving package versions...
ERROR: expected package `CuArrays [3a865a2d]` to exist at path `/home/jonathan/.julia/dev/CuArrays`

If I wipe the Manifest.toml, it then goes through. I do however think that the Project.toml may not need to have CUDAapi and CuArrays - that CUDA.jl should be sufficient going forward so long as it is lower bounded correctly.

@ViralBShah
Copy link
Author

Another thing I noticed is - does the package need both JSON2 and JSON3 as direct dependencies?

@jonathan-laurent
Copy link
Owner

Another thing I noticed is - does the package need both JSON2 and JSON3 as direct dependencies?

I think I use JSON2 for its pretty printing function, which did not exist in JSON3 last time I checked.

@ViralBShah
Copy link
Author

Are the tests are failing because I don't have Knet set up correctly for GPUs? Is a GPU a must for running the code?

Dummy Runs: Error During Test at /home/viralbshah/.julia/dev/AlphaZero/test/runtests.jl:16
  Test threw exception
  Expression: dummy_run(Tictactoe)
  Cannot find cudnn
  Stacktrace:
   [1] error(::String) at ./error.jl:33
   [2] cudnnCreate() at /home/viralbshah/.julia/packages/Knet/bTNMd/src/gpu.jl:306
   [3] cudnnhandle(::Int64) at /home/viralbshah/.julia/packages/Knet/bTNMd/src/gpu.jl:213
   [4] cudnnhandle() at /home/viralbshah/.julia/packages/Knet/bTNMd/src/gpu.jl:206
   [5] batchnorm2(::Knet.KnetArray{Float32,2}, ::Knet.KnetArray{Float32,2}, ::Knet.KnetArray{Float32,2}; moments::Knet.BNMoments, training::Bool, o::Base.Iterators.Pairs{Symbol,Knet.BNCache,Tuple{Symbol},NamedTuple{(:cache,),Tuple{Knet.BNCache}}}) at /home/viralbshah/.julia/packages/Knet/bTNMd/src/batchnorm.jl:496
   [6] batchnorm(::Knet.KnetArray{Float32,2}, ::Knet.BNMoments, ::AutoGrad.Param{Knet.KnetArray{Float32,1}}; training::Bool, o::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/viralbshah/.julia/packages/Knet/bTNMd/src/batchnorm.jl:90
   [7] (::AlphaZero.KNets.BatchNorm)(::Knet.KnetArray{Float32,2}) at /home/viralbshah/.julia/dev/AlphaZero/src/networks/knet/layers.jl:84
   [8] (::AlphaZero.KNets.Chain)(::Knet.KnetArray{Float32,2}) at /home/viralbshah/.julia/dev/AlphaZero/src/networks/knet/layers.jl:19 (repeats 2 times)
   [9] forward(::SimpleNet{Main.Tictactoe.Game}, ::Knet.KnetArray{Float32,4}) at /home/viralbshah/.julia/dev/AlphaZero/src/networks/knet.jl:148

@jonathan-laurent
Copy link
Owner

AlphaZero.jl is supposed to work in the absence of a GPU.
Are you getting this error on master and does your computer have a GPU with CUDA?

@ViralBShah
Copy link
Author

ViralBShah commented Jun 20, 2020

Yes, I am using master. If the package has a registered version in the General repo, I should use that, but I just started with the suggestion in the README.

I do have a GPU with CUDA libraries, but haven't compiled the Knet for CUDA just yet - if that is required. I thought it would be good to try out the package without GPU first, get its tests working, and then try the CUDA version.

@jonathan-laurent
Copy link
Owner

As you said, your version of Knet is not compiled for CUDA.
However, you should still not see this as the default tests are not supposed to use the GPU, so this is a bug on my end.
I am looking at this right now.

@jonathan-laurent
Copy link
Owner

Fixed.

@ViralBShah
Copy link
Author

Thanks for fixing so quickly! The tests are now passing on mac and linux for me. On to the GPU next.

Is this expected that there are different numbers of tests? Perhaps I have some optional packages installed on one system and not the other? Anyways, I just thought I would mention it since I saw it.

On Linux:

Test Summary: | Pass  Total
Testing Games | 5239   5239
Test Summary: | Pass  Total
Dummy Runs    |    1      1

On Mac:

Test Summary: | Pass  Total
Testing Games | 5276   5276
Test Summary: | Pass  Total
Dummy Runs    |    1      1

@jonathan-laurent
Copy link
Owner

Yes, the number of tests is nondeterministic right now as it depends on the length of a random simulated game. :-)

@ViralBShah
Copy link
Author

Trying to get GPU support in Knet working, but running into some issues, which I have filed upstream: denizyuret/Knet.jl#568

Just mentioning it here for anyone who might run into it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants